StarCluster - Mailing List Archive

removenode error

From: Robert Yu <no email>
Date: Fri, 24 Feb 2012 10:02:43 -0800

Hello,

I ran into this error when running "starcluster removenode" on a 50
node cluster.  I've pasted the text below, and attached gzip'ed log.
(I have not included the crash report because it is too big for
posting.)

-Robert
---------------------------------------------------------

...

>>> Adding parallel environment 'orte' to queue 'all.q'
>>> Removing node037 from known_hosts files
>>> Removing node037 from /etc/hosts
>>> Removing node037 from NFS
>>> Terminating node: node037 (i-f1754bb6)
>>> Removing node node038 (i-ff754bb8)...
>>> Removing node038 from SGE
>>> Updating SGE parallel environment 'orte'
13/13 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
error occurred in job (id=139888273798912): failed to connect to host
ec2-50-18-148-224.us-west-1.compute.amazonaws.com on port 22
Traceback (most recent call last):
 File "/usr/local/lib/python2.6/dist-packages/StarCluster-0.93.1-py2.6.egg/starcluster/threadpool.py",
line 31, in run
   job.run()
 File "/usr/local/lib/python2.6/dist-packages/StarCluster-0.93.1-py2.6.egg/starcluster/threadpool.py",
line 58, in run
   r = self.method(*self.args, **self.kwargs)
 File "/usr/local/lib/python2.6/dist-packages/StarCluster-0.93.1-py2.6.egg/starcluster/clustersetup.py",
line 351, in <lambda>
   num_processors = sum(self.pool.map(lambda n: n.num_processors, nodes))
 File "/usr/local/lib/python2.6/dist-packages/StarCluster-0.93.1-py2.6.egg/starcluster/node.py",
line 169, in num_processors
   'cat /proc/cpuinfo | grep processor | wc -l')[0])
 File "/usr/local/lib/python2.6/dist-packages/StarCluster-0.93.1-py2.6.egg/starcluster/ssh.py",
line 512, in execute
   channel = self.transport.open_session()
 File "/usr/local/lib/python2.6/dist-packages/StarCluster-0.93.1-py2.6.egg/starcluster/ssh.py",
line 129, in transport
   port=self._port, timeout=self._timeout)
 File "/usr/local/lib/python2.6/dist-packages/StarCluster-0.93.1-py2.6.egg/starcluster/ssh.py",
line 103, in connect
   raise exception.SSHConnectionError(host, port)
SSHConnectionError: failed to connect to host
ec2-50-18-148-224.us-west-1.compute.amazonaws.com on port 22


!!! ERROR - Oops! Looks like you've found a bug in StarCluster
!!! ERROR - Crash report written to:
/home/ryu/.starcluster/logs/crash-report-14078.txt
!!! ERROR - Please remove any sensitive data from the crash report
!!! ERROR - and submit it to starcluster_at_mit.edu


--
Robert Yu, Member Technical Staff
www.aditazz.com | robert.yu_at_aditazz.com
1111 Bayhill Drive Suite 260 | San Bruno | CA 94066
510.459.0216 | cell
650.627.7357 | 650.492.7000 x1008 | work
650.684.1149 | fax



Received on Fri Feb 24 2012 - 13:03:15 EST
This archive was generated by hypermail 2.3.0.

Search:

Sort all by:

Date

Month

Thread

Author

Subject