Re: Help needed with running mpich2 plugin using starcluster

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

From: Subbarao Kota <no email>
Date: Sun, 26 Feb 2012 14:54:29 -0500

Hi Justin, Thanks so much for taking time on this.

Everytimg I am starting the cluster using StarCluster I am receiving the errors as in the attached file. I have also attached my config file. Some values are replaced such as access_keys.

1. Why are these errors being received? See attached file (cluster_startup_errors) for errors. Also attached is the crash_report.
2. I included a [plugin mpich2] section in the config file but when starting the cluster with this setting doesn't seem start the plugins. Why is that?

My goal is to run some benchmarks (specifically STREAM (MPI version in C), IOR and NAS MPI) on Amazon EC2 instances. Once I am successful in running these on just 1 instance, my plan is to run them on 2 nodes, 4 nodes, 6 nodes and 8 nodes to evaluate the performance of these benchmarks. And I need to run them with MPICH2.

So far I have been able to run STREAM and IOR benchmarks on t1.micro instance. Now I want to create a small cluster of 2 nodes and run them. I think StarCluster can help me accomplish it.

Thanks,
Subbarao Kota.

SS-MBP:~ sinsub$ starcluster start -x -u ec2-user t1-micro-trial-cluster
StarCluster - (http://web.mit.edu/starcluster) (v. 0.93.1)
Software Tools for Academics and Researchers (STAR)
Please submit bug reports to starcluster_at_mit.edu

>>> Validating existing instances...
>>> Validating cluster template settings...
>>> Cluster template settings are valid
>>> Starting cluster...
>>> Starting stopped node: master
>>> Starting stopped node: node001
>>> Waiting for cluster to come up... (updating every 30s)
>>> Waiting for all nodes to be in a 'running' state...
2/2 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>> Waiting for SSH to come up on all nodes...
2/2 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>> Waiting for cluster to come up took 1.056 mins
>>> The master node is ec2-50-16-8-10.compute-1.amazonaws.com
>>> Setting up the cluster...
>>> Configuring hostnames...
2/2 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>> Shutting down threads...
20/20 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
error occurred in job (id=node001): Garbage packet received
Traceback (most recent call last):
  File "/Library/Python/2.7/site-packages/StarCluster-0.93.1-py2.7.egg/starcluster/threadpool.py", line 31, in run
    job.run()
  File "/Library/Python/2.7/site-packages/StarCluster-0.93.1-py2.7.egg/starcluster/threadpool.py", line 58, in run
    r = self.method(*self.args, **self.kwargs)
  File "/Library/Python/2.7/site-packages/StarCluster-0.93.1-py2.7.egg/starcluster/node.py", line 678, in set_hostname
    hostname_file = self.ssh.remote_file("/etc/hostname", "w")
  File "/Library/Python/2.7/site-packages/StarCluster-0.93.1-py2.7.egg/starcluster/ssh.py", line 290, in remote_file
    rfile = self.sftp.open(file, mode)
  File "/Library/Python/2.7/site-packages/StarCluster-0.93.1-py2.7.egg/starcluster/ssh.py", line 180, in sftp
    self._sftp = paramiko.SFTPClient.from_transport(self.transport)
  File "build/bdist.macosx-10.7-intel/egg/paramiko/sftp_client.py", line 106, in from_transport
    return cls(chan)
  File "build/bdist.macosx-10.7-intel/egg/paramiko/sftp_client.py", line 87, in __init__
    server_version = self._send_version()
  File "build/bdist.macosx-10.7-intel/egg/paramiko/sftp.py", line 108, in _send_version
    t, data = self._read_packet()
  File "build/bdist.macosx-10.7-intel/egg/paramiko/sftp.py", line 179, in _read_packet
    raise SFTPError('Garbage packet received')
SFTPError: Garbage packet received

error occurred in job (id=master): Garbage packet received
Traceback (most recent call last):
  File "/Library/Python/2.7/site-packages/StarCluster-0.93.1-py2.7.egg/starcluster/threadpool.py", line 31, in run
    job.run()
  File "/Library/Python/2.7/site-packages/StarCluster-0.93.1-py2.7.egg/starcluster/threadpool.py", line 58, in run
    r = self.method(*self.args, **self.kwargs)
  File "/Library/Python/2.7/site-packages/StarCluster-0.93.1-py2.7.egg/starcluster/node.py", line 678, in set_hostname
    hostname_file = self.ssh.remote_file("/etc/hostname", "w")
  File "/Library/Python/2.7/site-packages/StarCluster-0.93.1-py2.7.egg/starcluster/ssh.py", line 290, in remote_file
    rfile = self.sftp.open(file, mode)
  File "/Library/Python/2.7/site-packages/StarCluster-0.93.1-py2.7.egg/starcluster/ssh.py", line 180, in sftp
    self._sftp = paramiko.SFTPClient.from_transport(self.transport)
  File "build/bdist.macosx-10.7-intel/egg/paramiko/sftp_client.py", line 106, in from_transport
    return cls(chan)
  File "build/bdist.macosx-10.7-intel/egg/paramiko/sftp_client.py", line 87, in __init__
    server_version = self._send_version()
  File "build/bdist.macosx-10.7-intel/egg/paramiko/sftp.py", line 108, in _send_version
    t, data = self._read_packet()
  File "build/bdist.macosx-10.7-intel/egg/paramiko/sftp.py", line 179, in _read_packet
    raise SFTPError('Garbage packet received')
SFTPError: Garbage packet received

!!! ERROR - Oops! Looks like you've found a bug in StarCluster
!!! ERROR - Crash report written to: /Users/sinsub/.starcluster/logs/crash-report-9647.txt
!!! ERROR - Please remove any sensitive data from the crash report
!!! ERROR - and submit it to starcluster_at_mit.edu

####################################
## StarCluster Configuration File ##
####################################

[global]
# configure the default cluster template to use when starting a cluster
# defaults to 'smallcluster' defined below. this template should be usable
# out-of-the-box provided you've configured your keypair correctly
DEFAULT_TEMPLATE=t1-micro-trial-cluster
# enable experimental features for this release
ENABLE_EXPERIMENTAL=True
# number of seconds to wait when polling instances (default: 30s)
#REFRESH_INTERVAL=15
# specify a web browser to launch when viewing spot history plots
#WEB_BROWSER=chromium

[aws info]
# This is the AWS credentials section.
# These settings apply to all clusters
# replace these with your AWS keys
AWS_ACCESS_KEY_ID = [REPLACED]
AWS_SECRET_ACCESS_KEY = [REPLACED]
# replace this with your account number
AWS_USER_ID= [REPLACED]
# Uncomment to specify a different Amazon AWS region (OPTIONAL)
# (defaults to us-east-1 if not specified)
# NOTE: AMIs have to be migrated!
#AWS_REGION_NAME = eu-west-1
#AWS_REGION_HOST = ec2.eu-west-1.amazonaws.com
# Uncomment these settings when creating an instance-store (S3) AMI (OPTIONAL)
#EC2_CERT = /path/to/your/cert-asdf0as9df092039asdfi02089.pem
#EC2_PRIVATE_KEY = /path/to/your/pk-asdfasd890f200909.pem
# Uncomment these settings to use a proxy host when connecting to AWS
#aws_proxy = your.proxyhost.com
#aws_proxy_port = 8080
#aws_proxy_user = yourproxyuser
#aws_proxy_pass = yourproxypass

# Sections starting with "key" define your keypairs
# (see the EC2 getting started guide tutorial on using ec2-add-keypair to learn
# how to create new keypairs)
# Section name should match your key name e.g.:

[key mykey]

KEY_LOCATION= ~/.ssh/mykey.rsa

# You can of course have multiple keypair sections
# [key my-other]
# KEY_LOCATION=/home/myuser/.ssh/id_rsa-my-other-gsg-keypair
# Sections starting with "cluster" define your cluster templates
# Section name is the name you give to your cluster template e.g.:
#[cluster smallcluster]

[cluster t1-micro-trial-cluster]
# change this to the name of one of the keypair sections defined above

KEYNAME = mykey

# number of ec2 instances to launch
CLUSTER_SIZE = 2

# create the following user on the cluster
CLUSTER_USER = ec2-user

PLUGINS = mpich2

# optionally specify shell (defaults to bash)
# (options: tcsh, zsh, csh, bash, ksh)
CLUSTER_SHELL = bash

# AMI to use for cluster nodes. These AMIs are for the us-east-1 region.
# Use the 'listpublic' command to list StarCluster AMIs in other regions
# The base i386 StarCluster AMI is ami-899d49e0
# The base x86_64 StarCluster AMI is ami-999d49f0
# The base HVM StarCluster AMI is ami-4583572c
NODE_IMAGE_ID = ami-31814f58
# instance type for all cluster nodes
# (options: cg1.4xlarge, c1.xlarge, m1.small, c1.medium, m2.xlarge, t1.micro, cc1.4xlarge, cc2.8xlarge, m1.large, m1.xlarge, m2.4xlarge, m2.2xlarge)
NODE_INSTANCE_TYPE = t1.micro

# Uncomment to disable installing/configuring a queueing system on the
# cluster (SGE)
#DISABLE_QUEUE=True

# Uncomment to specify a different instance type for the master node (OPTIONAL)
# (defaults to NODE_INSTANCE_TYPE if not specified)
MASTER_INSTANCE_TYPE = t1.micro

# Uncomment to specify a separate AMI to use for the master node. (OPTIONAL)
# (defaults to NODE_IMAGE_ID if not specified)
MASTER_IMAGE_ID = ami-31814f58

# availability zone to launch the cluster in (OPTIONAL)
# (automatically determined based on volumes (if any) or
# selected by Amazon if not specified)
#AVAILABILITY_ZONE = us-east-1c

# list of volumes to attach to the master node (OPTIONAL)
# these volumes, if any, will be NFS shared to the worker nodes
# see "Configuring EBS Volumes" below on how to define volume sections
#VOLUMES = myvol1

[plugin mpich2]

setup_class = starcluster.plugins.mpich2.MPICH2Setup

# list of plugins to load after StarCluster's default setup routines (OPTIONAL)
# see "Configuring StarCluster Plugins" below on how to define plugin sections

# [cluster t1-micro-trial-cluster]
#PLUGINS = mpich2
#KEYNAME = mykey
#NODE_INSTANCE_TYPE = t1.micro
#CLUSTER_SIZE = 2
#NODE_IMAGE_ID = ami-31814f58

# list of permissions (or firewall rules) to apply to the cluster's security
# group (OPTIONAL).
#PERMISSIONS = ssh, http

# Uncomment to always create a spot cluster when creating a new cluster from
# this template. The following example will place a $0.50 bid for each spot
# request.
#SPOT_BID = 0.50

###########################################
## Defining Additional Cluster Templates ##
###########################################

# You can also define multiple cluster templates.
# You can either supply all configuration options as with smallcluster above,
# or create an EXTENDS=<cluster_name> variable in the new cluster section to
# use all settings from <cluster_name> as defaults. Below are a couple of
# example cluster templates that use the EXTENDS feature:

# [cluster mediumcluster]
# Declares that this cluster uses smallcluster as defaults
# EXTENDS=smallcluster
# This section is the same as smallcluster except for the following settings:
# KEYNAME=my-other-gsg-keypair
# NODE_INSTANCE_TYPE = c1.xlarge
# CLUSTER_SIZE=8
# VOLUMES = biodata2

# [cluster largecluster]
# Declares that this cluster uses mediumcluster as defaults
# EXTENDS=mediumcluster
# This section is the same as mediumcluster except for the following variables:
# CLUSTER_SIZE=16
#############################
## Configuring EBS Volumes ##
#############################

# A new [volume] section must be created for each EBS volume you wish to use
# with StarCluser. The section name is a tag for your volume. This tag is used
# in the VOLUMES setting of a cluster template to declare that an EBS volume is
# to be mounted and nfs shared on the cluster. (see the commented VOLUMES
# setting in the example 'smallcluster' template above)
# Below are some examples of defining and configuring EBS volumes to be used
# with StarCluster:

# Sections starting with "volume" define your EBS volumes
# Section name tags your volume e.g.:
# [volume myvol1]
# (attach 1st partition of volume vol-c9999999 to /home on master node)
# VOLUME_ID = vol-c9999999
# MOUNT_PATH = /home

# Same volume as above, but mounts to different location
# [volume biodata2]
# (attach 1st partition of volume vol-c9999999 to /opt/ on master node)
# VOLUME_ID = vol-c999999
# MOUNT_PATH = /opt/

# Another volume example
# [volume oceandata]
# (attach 1st partition of volume vol-d7777777 to /mydata on master node)
# VOLUME_ID = vol-d7777777
# MOUNT_PATH = /mydata

# Same as oceandata only uses the 2nd partition instead
# [volume oceandata]
# (attach 2nd partition of volume vol-d7777777 to /mydata on master node)
# VOLUME_ID = vol-d7777777
# MOUNT_PATH = /mydata
# PARTITION = 2

#####################################
## Configuring StarCluster Plugins ##
#####################################

# Sections starting with "plugin" define a custom python class which can
# perform additional configurations to StarCluster's default routines. These
# plugins can be assigned to a cluster template to customize the setup
# procedure when starting a cluster from this template
# (see the commented PLUGINS setting in the 'smallcluster' template above)
# Below is an example of defining a plugin called 'myplugin':

# [plugin myplugin]
# myplugin module either lives in ~/.starcluster/plugins or is
# in your PYTHONPATH
# SETUP_CLASS = myplugin.SetupClass
# extra settings are passed as arguments to your plugin:
# SOME_PARAM_FOR_MY_PLUGIN = 1
# SOME_OTHER_PARAM = 2

############################################
## Configuring Security Group Permissions ##
############################################

# [permission ssh]
# protocol can be: tcp, udp, or icmp
# protocol = tcp
# from_port = 22
# to_port = 22
# cidr_ip = <your_ip>/32

# example for opening port 80 on the cluster to a specific IP range
# [permission http]
# protocol = tcp
# from_port = 80
# to_port = 80
# cidr_ip = 18.0.0.0/24

SS-MBP:~ sinsub$ cat .starcluster/logs/crash-report-9647.txt
---------- CRASH DETAILS ----------
COMMAND: starcluster start -x -u ec2-user t1-micro-trial-cluster
2012-02-26 14:39:03,678 PID: 9647 config.py:551 - DEBUG - Loading config
2012-02-26 14:39:03,678 PID: 9647 config.py:118 - DEBUG - Loading file: /Users/sinsub/.starcluster/config
2012-02-26 14:39:03,681 PID: 9647 awsutils.py:54 - DEBUG - creating self._conn w/ connection_authenticator kwargs = {'proxy_user': None, 'proxy_pass': None, 'proxy_port': None, 'proxy': None, 'is_secure': True, 'path': '/', 'region': None, 'port': None}
2012-02-26 14:39:04,197 PID: 9647 cluster.py:665 - DEBUG - existing nodes: {}
2012-02-26 14:39:04,197 PID: 9647 cluster.py:673 - DEBUG - adding node i-b84074dd to self._nodes list
2012-02-26 14:39:04,198 PID: 9647 cluster.py:673 - DEBUG - adding node i-ba4074df to self._nodes list
2012-02-26 14:39:04,198 PID: 9647 cluster.py:681 - DEBUG - returning self._nodes = [<Node: master (i-b84074dd)>, <Node: node001 (i-ba4074df)>]
2012-02-26 14:39:04,198 PID: 9647 cluster.py:513 - INFO - Validating existing instances...
2012-02-26 14:39:04,198 PID: 9647 cluster.py:909 - DEBUG - Launch map: node001 (ami: ami-31814f58, type: t1.micro)...
2012-02-26 14:39:04,198 PID: 9647 cluster.py:1515 - INFO - Validating cluster template settings...
2012-02-26 14:39:04,576 PID: 9647 cluster.py:909 - DEBUG - Launch map: node001 (ami: ami-31814f58, type: t1.micro)...
2012-02-26 14:39:04,576 PID: 9647 cluster.py:1530 - INFO - Cluster template settings are valid
2012-02-26 14:39:04,577 PID: 9647 cluster.py:1406 - INFO - Starting cluster...
2012-02-26 14:39:04,645 PID: 9647 cluster.py:665 - DEBUG - existing nodes: {u'i-ba4074df': <Node: node001 (i-ba4074df)>, u'i-b84074dd': <Node: master (i-b84074dd)>}
2012-02-26 14:39:04,646 PID: 9647 cluster.py:668 - DEBUG - updating existing node i-b84074dd in self._nodes
2012-02-26 14:39:04,646 PID: 9647 cluster.py:668 - DEBUG - updating existing node i-ba4074df in self._nodes
2012-02-26 14:39:04,646 PID: 9647 cluster.py:681 - DEBUG - returning self._nodes = [<Node: master (i-b84074dd)>, <Node: node001 (i-ba4074df)>]
2012-02-26 14:39:04,646 PID: 9647 cluster.py:1412 - INFO - Starting stopped node: master
2012-02-26 14:39:04,848 PID: 9647 cluster.py:1412 - INFO - Starting stopped node: node001
2012-02-26 14:39:05,285 PID: 9647 cluster.py:1218 - INFO - Waiting for cluster to come up... (updating every 30s)
2012-02-26 14:39:05,459 PID: 9647 cluster.py:665 - DEBUG - existing nodes: {u'i-ba4074df': <Node: node001 (i-ba4074df)>, u'i-b84074dd': <Node: master (i-b84074dd)>}
2012-02-26 14:39:05,459 PID: 9647 cluster.py:668 - DEBUG - updating existing node i-b84074dd in self._nodes
2012-02-26 14:39:05,459 PID: 9647 cluster.py:668 - DEBUG - updating existing node i-ba4074df in self._nodes
2012-02-26 14:39:05,459 PID: 9647 cluster.py:681 - DEBUG - returning self._nodes = [<Node: master (i-b84074dd)>, <Node: node001 (i-ba4074df)>]
2012-02-26 14:39:05,459 PID: 9647 cluster.py:1176 - INFO - Waiting for all nodes to be in a 'running' state...
2012-02-26 14:39:05,533 PID: 9647 cluster.py:665 - DEBUG - existing nodes: {u'i-ba4074df': <Node: node001 (i-ba4074df)>, u'i-b84074dd': <Node: master (i-b84074dd)>}
2012-02-26 14:39:05,533 PID: 9647 cluster.py:668 - DEBUG - updating existing node i-b84074dd in self._nodes
2012-02-26 14:39:05,533 PID: 9647 cluster.py:668 - DEBUG - updating existing node i-ba4074df in self._nodes
2012-02-26 14:39:05,534 PID: 9647 cluster.py:681 - DEBUG - returning self._nodes = [<Node: master (i-b84074dd)>, <Node: node001 (i-ba4074df)>]
2012-02-26 14:39:35,643 PID: 9647 cluster.py:665 - DEBUG - existing nodes: {u'i-ba4074df': <Node: node001 (i-ba4074df)>, u'i-b84074dd': <Node: master (i-b84074dd)>}
2012-02-26 14:39:35,644 PID: 9647 cluster.py:668 - DEBUG - updating existing node i-b84074dd in self._nodes
2012-02-26 14:39:35,644 PID: 9647 cluster.py:668 - DEBUG - updating existing node i-ba4074df in self._nodes
2012-02-26 14:39:35,644 PID: 9647 cluster.py:681 - DEBUG - returning self._nodes = [<Node: master (i-b84074dd)>, <Node: node001 (i-ba4074df)>]
2012-02-26 14:40:05,721 PID: 9647 cluster.py:665 - DEBUG - existing nodes: {u'i-ba4074df': <Node: node001 (i-ba4074df)>, u'i-b84074dd': <Node: master (i-b84074dd)>}
2012-02-26 14:40:05,721 PID: 9647 cluster.py:668 - DEBUG - updating existing node i-b84074dd in self._nodes
2012-02-26 14:40:05,722 PID: 9647 cluster.py:668 - DEBUG - updating existing node i-ba4074df in self._nodes
2012-02-26 14:40:05,722 PID: 9647 cluster.py:681 - DEBUG - returning self._nodes = [<Node: master (i-b84074dd)>, <Node: node001 (i-ba4074df)>]
2012-02-26 14:40:05,722 PID: 9647 cluster.py:1194 - INFO - Waiting for SSH to come up on all nodes...
2012-02-26 14:40:05,797 PID: 9647 cluster.py:665 - DEBUG - existing nodes: {u'i-ba4074df': <Node: node001 (i-ba4074df)>, u'i-b84074dd': <Node: master (i-b84074dd)>}
2012-02-26 14:40:05,797 PID: 9647 cluster.py:668 - DEBUG - updating existing node i-b84074dd in self._nodes
2012-02-26 14:40:05,797 PID: 9647 cluster.py:668 - DEBUG - updating existing node i-ba4074df in self._nodes
2012-02-26 14:40:05,797 PID: 9647 cluster.py:681 - DEBUG - returning self._nodes = [<Node: master (i-b84074dd)>, <Node: node001 (i-ba4074df)>]
2012-02-26 14:40:05,879 PID: 9647 ssh.py:75 - DEBUG - loading private key /Users/sinsub/.ssh/mykey.rsa
2012-02-26 14:40:05,880 PID: 9647 ssh.py:160 - DEBUG - Using private key /Users/sinsub/.ssh/mykey.rsa (rsa)
2012-02-26 14:40:05,880 PID: 9647 ssh.py:97 - DEBUG - connecting to host ec2-50-16-8-10.compute-1.amazonaws.com on port 22 as user root
2012-02-26 14:40:07,356 PID: 9647 ssh.py:75 - DEBUG - loading private key /Users/sinsub/.ssh/mykey.rsa
2012-02-26 14:40:07,357 PID: 9647 ssh.py:160 - DEBUG - Using private key /Users/sinsub/.ssh/mykey.rsa (rsa)
2012-02-26 14:40:07,357 PID: 9647 ssh.py:97 - DEBUG - connecting to host ec2-23-20-111-97.compute-1.amazonaws.com on port 22 as user root
2012-02-26 14:40:08,642 PID: 9647 utils.py:89 - INFO - Waiting for cluster to come up took 1.056 mins
2012-02-26 14:40:08,642 PID: 9647 cluster.py:1433 - INFO - The master node is ec2-50-16-8-10.compute-1.amazonaws.com
2012-02-26 14:40:08,642 PID: 9647 cluster.py:1434 - INFO - Setting up the cluster...
2012-02-26 14:40:08,708 PID: 9647 cluster.py:665 - DEBUG - existing nodes: {u'i-ba4074df': <Node: node001 (i-ba4074df)>, u'i-b84074dd': <Node: master (i-b84074dd)>}
2012-02-26 14:40:08,708 PID: 9647 cluster.py:668 - DEBUG - updating existing node i-b84074dd in self._nodes
2012-02-26 14:40:08,709 PID: 9647 cluster.py:668 - DEBUG - updating existing node i-ba4074df in self._nodes
2012-02-26 14:40:08,709 PID: 9647 cluster.py:681 - DEBUG - returning self._nodes = [<Node: master (i-b84074dd)>, <Node: node001 (i-ba4074df)>]
2012-02-26 14:40:08,709 PID: 9647 clustersetup.py:94 - INFO - Configuring hostnames...
2012-02-26 14:40:08,713 PID: 9647 threadpool.py:135 - DEBUG - unfinished_tasks = 2
2012-02-26 14:40:08,714 PID: 9647 ssh.py:179 - DEBUG - creating sftp connection
2012-02-26 14:40:08,714 PID: 9647 ssh.py:179 - DEBUG - creating sftp connection
2012-02-26 14:40:09,715 PID: 9647 threadpool.py:123 - INFO - Shutting down threads...
2012-02-26 14:40:09,720 PID: 9647 threadpool.py:135 - DEBUG - unfinished_tasks = 6
2012-02-26 14:40:10,722 PID: 9647 cli.py:266 - DEBUG - error occurred in job (id=node001): Garbage packet received
Traceback (most recent call last):
  File "/Library/Python/2.7/site-packages/StarCluster-0.93.1-py2.7.egg/starcluster/threadpool.py", line 31, in run
    job.run()
  File "/Library/Python/2.7/site-packages/StarCluster-0.93.1-py2.7.egg/starcluster/threadpool.py", line 58, in run
    r = self.method(*self.args, **self.kwargs)
  File "/Library/Python/2.7/site-packages/StarCluster-0.93.1-py2.7.egg/starcluster/node.py", line 678, in set_hostname
    hostname_file = self.ssh.remote_file("/etc/hostname", "w")
  File "/Library/Python/2.7/site-packages/StarCluster-0.93.1-py2.7.egg/starcluster/ssh.py", line 290, in remote_file
    rfile = self.sftp.open(file, mode)
  File "/Library/Python/2.7/site-packages/StarCluster-0.93.1-py2.7.egg/starcluster/ssh.py", line 180, in sftp
    self._sftp = paramiko.SFTPClient.from_transport(self.transport)
  File "build/bdist.macosx-10.7-intel/egg/paramiko/sftp_client.py", line 106, in from_transport
    return cls(chan)
  File "build/bdist.macosx-10.7-intel/egg/paramiko/sftp_client.py", line 87, in __init__
    server_version = self._send_version()
  File "build/bdist.macosx-10.7-intel/egg/paramiko/sftp.py", line 108, in _send_version
    t, data = self._read_packet()
  File "build/bdist.macosx-10.7-intel/egg/paramiko/sftp.py", line 179, in _read_packet
    raise SFTPError('Garbage packet received')
SFTPError: Garbage packet received

error occurred in job (id=master): Garbage packet received
Traceback (most recent call last):
  File "/Library/Python/2.7/site-packages/StarCluster-0.93.1-py2.7.egg/starcluster/threadpool.py", line 31, in run
    job.run()
  File "/Library/Python/2.7/site-packages/StarCluster-0.93.1-py2.7.egg/starcluster/threadpool.py", line 58, in run
    r = self.method(*self.args, **self.kwargs)
  File "/Library/Python/2.7/site-packages/StarCluster-0.93.1-py2.7.egg/starcluster/node.py", line 678, in set_hostname
    hostname_file = self.ssh.remote_file("/etc/hostname", "w")
  File "/Library/Python/2.7/site-packages/StarCluster-0.93.1-py2.7.egg/starcluster/ssh.py", line 290, in remote_file
    rfile = self.sftp.open(file, mode)
  File "/Library/Python/2.7/site-packages/StarCluster-0.93.1-py2.7.egg/starcluster/ssh.py", line 180, in sftp
    self._sftp = paramiko.SFTPClient.from_transport(self.transport)
  File "build/bdist.macosx-10.7-intel/egg/paramiko/sftp_client.py", line 106, in from_transport
    return cls(chan)
  File "build/bdist.macosx-10.7-intel/egg/paramiko/sftp_client.py", line 87, in __init__
    server_version = self._send_version()
  File "build/bdist.macosx-10.7-intel/egg/paramiko/sftp.py", line 108, in _send_version
    t, data = self._read_packet()
  File "build/bdist.macosx-10.7-intel/egg/paramiko/sftp.py", line 179, in _read_packet
    raise SFTPError('Garbage packet received')
SFTPError: Garbage packet received

---------- SYSTEM INFO ----------
StarCluster: 0.93.1
Python: 2.7.1 (r271:86832, Jun 16 2011, 16:59:05) [GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2335.15.00)]
Platform: Darwin-11.3.0-x86_64-i386-64bit
boto: 2.0
paramiko: 1.7.7.1 (George)
Crypto: 2.5
jinja2: 2.5.5
decorator: 3.3.1

On Feb 22, 2012, at 11:51 AM, Justin Riley wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Hi Subbarao,
>
> Which AMI are you using? This error usually occurs because you're
> using an AMI that does not allow root logins.
>
> In general you should use either the StarCluster supported AMIs or an
> AMI based on the StarCluster supported AMIs. Have a look at the
> following doc on how to customize the StarCluster supported AMIs:
>
> http://web.mit.edu/star/cluster/docs/latest/manual/create_new_ami.html
>
> Also, do you still have the 'crash report' that was generated when you
> got this error? It would be useful to take a look at this as well if
> it's still around....
>
> ~Justin
>
> On 02/20/2012 02:57 PM, Subbarao Kota wrote:
>> Hello, I received the below error when attempted to run the mpich2
>> plugin on a 2-node cluster of t1.micro instance via starcluster.
>> Can you please let me know what needs to be fixed or corrected?
>> what additional information you will need to help.
>>
>> Please advise. Thanks.
>>
>> ====
>>
>> StarCluster - (http://web.mit.edu/starcluster) (v. 0.93.1) Software
>> Tools for Academics and Researchers (STAR) Please submit bug
>> reports to starcluster_at_mit.edu
>>
>>>>> Running plugin mpich2 Creating MPICH2 hosts file
>> !!! ERROR - Error occurred while running plugin 'mpich2': Traceback
>> (most recent call last): File
>> "/Library/Python/2.7/site-packages/StarCluster-0.93.1-py2.7.egg/starcluster/cluster.py",
>> line 1482, in run_plugin func(*args) File
>> "/Library/Python/2.7/site-packages/StarCluster-0.93.1-py2.7.egg/starcluster/plugins/mpich2.py",
>> line 33, in run mpich2_hosts =
>> master.ssh.remote_file(self.MPICH2_HOSTS, 'w') File
>> "/Library/Python/2.7/site-packages/StarCluster-0.93.1-py2.7.egg/starcluster/ssh.py",
>> line 290, in remote_file rfile = self.sftp.open(file, mode) File
>> "/Library/Python/2.7/site-packages/StarCluster-0.93.1-py2.7.egg/starcluster/ssh.py",
>> line 180, in sftp self._sftp =
>> paramiko.SFTPClient.from_transport(self.transport) File
>> "build/bdist.macosx-10.7-intel/egg/paramiko/sftp_client.py", line
>> 106, in from_transport return cls(chan) File
>> "build/bdist.macosx-10.7-intel/egg/paramiko/sftp_client.py", line
>> 87, in __init__ server_version = self._send_version() File
>> "build/bdist.macosx-10.7-intel/egg/paramiko/sftp.py", line 108, in
>> _send_version t, data = self._read_packet() File
>> "build/bdist.macosx-10.7-intel/egg/paramiko/sftp.py", line 179, in
>> _read_packet raise SFTPError('Garbage packet received') SFTPError:
>> Garbage packet received
>> _______________________________________________ StarCluster mailing
>> list StarCluster_at_mit.edu
>> http://mailman.mit.edu/mailman/listinfo/starcluster
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v2.0.17 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
>
> iEYEARECAAYFAk9FHPUACgkQ4llAkMfDcrluFwCfcrRe9sW8RRHCmHhK9z4BRp6a
> 5eAAn0nVlMsvg1OQY+RjELzq3Vo5krnQ
> =Eby5
> -----END PGP SIGNATURE-----
Received on Sun Feb 26 2012 - 14:54:32 EST

This message: [ Message body ]
Next message: Subbarao Kota: "Passwordless login into the Cluster"
Previous message: Justin Riley: "Re: Help needed with running mpich2 plugin using starcluster"
In reply to: Justin Riley: "Re: Help needed with running mpich2 plugin using starcluster"
Next in thread: Justin Riley: "Re: Help needed with running mpich2 plugin using starcluster"
Reply: Justin Riley: "Re: Help needed with running mpich2 plugin using starcluster"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

This archive was generated by hypermail 2.3.0.

Search:

Sort all by:

Navigation

Re: Help needed with running mpich2 plugin using starcluster

Search:

Sort all by:

Navigation