StarCluster - Mailing List Archive

Bug report

From: Shpungin, Joseph <no email>
Date: Thu, 4 Oct 2012 13:25:36 -0400

Hi,
My name is Joe Shpungin.
I attempted to use StarCluster to launch a test cluster but the attempt failed with the following log. Could you please let me know what went wrong?

---------- CRASH DETAILS ----------
COMMAND: starcluster start CLUSTER1 --bid 0.20
2012-10-04 17:06:20,903 PID: 23118 config.py:551 - DEBUG - Loading config
2012-10-04 17:06:20,903 PID: 23118 config.py:118 - DEBUG - Loading file: /home/ec2-user/.starcluster/config
2012-10-04 17:06:20,907 PID: 23118 awsutils.py:54 - DEBUG - creating self._conn w/ connection_authenticator kwargs = {'proxy_user': None, 'proxy_pass': None, 'proxy_port': None, 'proxy': None, 'is_secure': True, 'path': '/', 'region': None, 'port': None}
2012-10-04 17:06:21,025 PID: 23118 start.py:176 - INFO - Using default cluster template: smallcluster
2012-10-04 17:06:21,025 PID: 23118 base.py:147 - WARNING - ************************************************************
2012-10-04 17:06:21,029 PID: 23118 base.py:151 - WARNING - Waiting 5 seconds before continuing...
2012-10-04 17:06:21,029 PID: 23118 base.py:152 - WARNING - Press CTRL-C to cancel...
2012-10-04 17:06:26,036 PID: 23118 cluster.py:1539 - INFO - Validating cluster template settings...
2012-10-04 17:06:26,376 PID: 23118 cluster.py:926 - DEBUG - Launch map: node001 (ami: ami-899d49e0, type: m1.small)...
2012-10-04 17:06:26,376 PID: 23118 cluster.py:1555 - INFO - Cluster template settings are valid
2012-10-04 17:06:26,377 PID: 23118 cluster.py:1427 - INFO - Starting cluster...
2012-10-04 17:06:26,377 PID: 23118 cluster.py:952 - INFO - Launching a 2-node cluster...
2012-10-04 17:06:26,377 PID: 23118 cluster.py:926 - DEBUG - Launch map: node001 (ami: ami-899d49e0, type: m1.small)...
2012-10-04 17:06:26,378 PID: 23118 cluster.py:1004 - INFO - Launching master node (ami: ami-899d49e0, type: m1.small)...
2012-10-04 17:06:26,442 PID: 23118 awsutils.py:165 - INFO - Creating security group _at_sc-CLUSTER1...
2012-10-04 17:06:28,001 PID: 23118 cluster.py:772 - INFO - Reservation:r-26ce3940
2012-10-04 17:06:28,002 PID: 23118 cluster.py:926 - DEBUG - Launch map: node001 (ami: ami-899d49e0, type: m1.small)...
2012-10-04 17:06:28,002 PID: 23118 cluster.py:1024 - INFO - Launching node001 (ami: ami-899d49e0, type: m1.small)
2012-10-04 17:06:28,151 PID: 23118 cluster.py:772 - INFO - SpotInstanceRequest:sir-8fd42811
2012-10-04 17:06:28,151 PID: 23118 cluster.py:1235 - INFO - Waiting for cluster to come up... (updating every 30s)
2012-10-04 17:06:28,543 PID: 23118 cluster.py:664 - DEBUG - existing nodes: {}
2012-10-04 17:06:28,543 PID: 23118 cluster.py:672 - DEBUG - adding node i-a5a334d8 to self._nodes list
2012-10-04 17:06:29,877 PID: 23118 cluster.py:680 - DEBUG - returning self._nodes = [<Node: master (i-a5a334d8)>]
2012-10-04 17:06:29,877 PID: 23118 cluster.py:1193 - INFO - Waiting for all nodes to be in a 'running' state...
2012-10-04 17:06:29,963 PID: 23118 cluster.py:664 - DEBUG - existing nodes: {u'i-a5a334d8': <Node: master (i-a5a334d8)>}
2012-10-04 17:06:29,963 PID: 23118 cluster.py:667 - DEBUG - updating existing node i-a5a334d8 in self._nodes
2012-10-04 17:06:29,964 PID: 23118 cluster.py:680 - DEBUG - returning self._nodes = [<Node: master (i-a5a334d8)>]
2012-10-04 17:07:00,172 PID: 23118 cluster.py:664 - DEBUG - existing nodes: {u'i-a5a334d8': <Node: master (i-a5a334d8)>}
2012-10-04 17:07:00,173 PID: 23118 cluster.py:667 - DEBUG - updating existing node i-a5a334d8 in self._nodes
2012-10-04 17:07:00,173 PID: 23118 cluster.py:680 - DEBUG - returning self._nodes = [<Node: master (i-a5a334d8)>]
2012-10-04 17:07:00,173 PID: 23118 cluster.py:1211 - INFO - Waiting for SSH to come up on all nodes...
2012-10-04 17:07:00,254 PID: 23118 cluster.py:664 - DEBUG - existing nodes: {u'i-a5a334d8': <Node: master (i-a5a334d8)>}
2012-10-04 17:07:00,254 PID: 23118 cluster.py:667 - DEBUG - updating existing node i-a5a334d8 in self._nodes
2012-10-04 17:07:00,254 PID: 23118 cluster.py:680 - DEBUG - returning self._nodes = [<Node: master (i-a5a334d8)>]
2012-10-04 17:07:00,349 PID: 23118 __init__.py:75 - DEBUG - loading private key /home/ec2-user/.ssh/STAR_KEY
2012-10-04 17:07:00,349 PID: 23118 __init__.py:82 - DEBUG - specified key does not end in either rsa or dsa, trying both
2012-10-04 17:07:00,350 PID: 23118 __init__.py:167 - DEBUG - Using private key /home/ec2-user/.ssh/STAR_KEY (rsa)
2012-10-04 17:07:00,351 PID: 23118 __init__.py:97 - DEBUG - connecting to host ec2-23-20-80-59.compute-1.amazonaws.com on port 22 as user root
2012-10-04 17:07:33,958 PID: 23118 cluster.py:664 - DEBUG - existing nodes: {u'i-a5a334d8': <Node: master (i-a5a334d8)>}
2012-10-04 17:07:33,959 PID: 23118 cluster.py:667 - DEBUG - updating existing node i-a5a334d8 in self._nodes
2012-10-04 17:07:33,959 PID: 23118 cluster.py:680 - DEBUG - returning self._nodes = [<Node: master (i-a5a334d8)>]
2012-10-04 17:07:34,039 PID: 23118 __init__.py:97 - DEBUG - connecting to host ec2-23-20-80-59.compute-1.amazonaws.com on port 22 as user root
2012-10-04 17:08:04,156 PID: 23118 cluster.py:664 - DEBUG - existing nodes: {u'i-a5a334d8': <Node: master (i-a5a334d8)>}
2012-10-04 17:08:04,157 PID: 23118 cluster.py:667 - DEBUG - updating existing node i-a5a334d8 in self._nodes
2012-10-04 17:08:04,157 PID: 23118 cluster.py:680 - DEBUG - returning self._nodes = [<Node: master (i-a5a334d8)>]
2012-10-04 17:08:04,229 PID: 23118 __init__.py:97 - DEBUG - connecting to host ec2-23-20-80-59.compute-1.amazonaws.com on port 22 as user root
2012-10-04 17:08:05,196 PID: 23118 __init__.py:186 - DEBUG - creating sftp connection
2012-10-04 17:08:06,213 PID: 23118 utils.py:93 - INFO - Waiting for cluster to come up took 1.634 mins
2012-10-04 17:08:06,214 PID: 23118 cluster.py:1454 - INFO - The master node is ec2-23-20-80-59.compute-1.amazonaws.com
2012-10-04 17:08:06,215 PID: 23118 cluster.py:1455 - INFO - Setting up the cluster...
2012-10-04 17:08:06,377 PID: 23118 cluster.py:664 - DEBUG - existing nodes: {u'i-a5a334d8': <Node: master (i-a5a334d8)>}
2012-10-04 17:08:06,378 PID: 23118 cluster.py:667 - DEBUG - updating existing node i-a5a334d8 in self._nodes
2012-10-04 17:08:06,378 PID: 23118 cluster.py:672 - DEBUG - adding node i-e9ae3994 to self._nodes list
2012-10-04 17:08:07,510 PID: 23118 cluster.py:680 - DEBUG - returning self._nodes = [<Node: master (i-a5a334d8)>, <Node: node001 (i-e9ae3994)>]
2012-10-04 17:08:07,511 PID: 23118 clustersetup.py:90 - INFO - Configuring hostnames...
2012-10-04 17:08:07,516 PID: 23118 threadpool.py:135 - DEBUG - unfinished_tasks = 2
2012-10-04 17:08:07,517 PID: 23118 __init__.py:75 - DEBUG - loading private key /home/ec2-user/.ssh/STAR_KEY
2012-10-04 17:08:07,517 PID: 23118 __init__.py:82 - DEBUG - specified key does not end in either rsa or dsa, trying both
2012-10-04 17:08:07,519 PID: 23118 __init__.py:167 - DEBUG - Using private key /home/ec2-user/.ssh/STAR_KEY (rsa)
2012-10-04 17:08:07,520 PID: 23118 __init__.py:186 - DEBUG - creating sftp connection
2012-10-04 17:08:07,520 PID: 23118 __init__.py:97 - DEBUG - connecting to host on port 22 as user root
2012-10-04 17:08:08,517 PID: 23118 threadpool.py:123 - INFO - Shutting down threads...
2012-10-04 17:08:08,518 PID: 23118 threadpool.py:135 - DEBUG - unfinished_tasks = 20
2012-10-04 17:08:09,520 PID: 23118 cli.py:266 - DEBUG - error occurred in job (id=node001): failed to connect to host on port 22
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/StarCluster-0.93.3-py2.6.egg/starcluster/threadpool.py", line 31, in run
    job.run()
  File "/usr/lib/python2.6/site-packages/StarCluster-0.93.3-py2.6.egg/starcluster/threadpool.py", line 58, in run
    r = self.method(*self.args, **self.kwargs)
  File "/usr/lib/python2.6/site-packages/StarCluster-0.93.3-py2.6.egg/starcluster/node.py", line 696, in set_hostname
    hostname_file = self.ssh.remote_file("/etc/hostname", "w")
  File "/usr/lib/python2.6/site-packages/StarCluster-0.93.3-py2.6.egg/starcluster/sshutils/__init__.py", line 296, in remote_file
    rfile = self.sftp.open(file, mode)
  File "/usr/lib/python2.6/site-packages/StarCluster-0.93.3-py2.6.egg/starcluster/sshutils/__init__.py", line 187, in sftp
    self._sftp = ssh.SFTPClient.from_transport(self.transport)
  File "/usr/lib/python2.6/site-packages/StarCluster-0.93.3-py2.6.egg/starcluster/sshutils/__init__.py", line 136, in transport
    port=self._port, timeout=self._timeout)
  File "/usr/lib/python2.6/site-packages/StarCluster-0.93.3-py2.6.egg/starcluster/sshutils/__init__.py", line 103, in connect
    raise exception.SSHConnectionError(host, port)
SSHConnectionError: failed to connect to host on port 22

---------- SYSTEM INFO ----------
StarCluster: 0.93.3
Python: 2.6.8 (unknown, Jun 29 2012, 06:50:56) [GCC 4.4.6 20110731 (Red Hat 4.4.6-3)]
Platform: Linux-3.2.21-1.32.6.amzn1.x86_64-x86_64-with-glibc2.2.5
boto: 2.3.0
ssh: 1.7.13
Crypto: 2.6
jinja2: 2.6
decorator: 3.3.1

Thank you,
-Joe

joe_shpungin_at_merck.com



Notice: This e-mail message, together with any attachments, contains
information of Merck & Co., Inc. (One Merck Drive, Whitehouse Station,
New Jersey, USA 08889), and/or its affiliates Direct contact information
for affiliates is available at
http://www.merck.com/contact/contacts.html) that may be confidential,
proprietary copyrighted and/or legally privileged. It is intended solely
for the use of the individual or entity named on this message. If you are
not the intended recipient, and have received this message in error,
please notify us immediately by reply e-mail and then delete it from
your system.
Received on Thu Oct 04 2012 - 13:25:41 EDT
This archive was generated by hypermail 2.3.0.

Search:

Sort all by:

Date

Month

Thread

Author

Subject