Hello,
I ran into this error when running "starcluster removenode" on a 50
node cluster. I've pasted the text below, and attached gzip'ed log.
(I have not included the crash report because it is too big for
posting.)
-Robert
---------------------------------------------------------
...
>>> Adding parallel environment 'orte' to queue 'all.q'
>>> Removing node037 from known_hosts files
>>> Removing node037 from /etc/hosts
>>> Removing node037 from NFS
>>> Terminating node: node037 (i-f1754bb6)
>>> Removing node node038 (i-ff754bb8)...
>>> Removing node038 from SGE
>>> Updating SGE parallel environment 'orte'
13/13 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
error occurred in job (id=139888273798912): failed to connect to host
ec2-50-18-148-224.us-west-1.compute.amazonaws.com on port 22
Traceback (most recent call last):
File "/usr/local/lib/python2.6/dist-packages/StarCluster-0.93.1-py2.6.egg/starcluster/threadpool.py",
line 31, in run
job.run()
File "/usr/local/lib/python2.6/dist-packages/StarCluster-0.93.1-py2.6.egg/starcluster/threadpool.py",
line 58, in run
r = self.method(*self.args, **self.kwargs)
File "/usr/local/lib/python2.6/dist-packages/StarCluster-0.93.1-py2.6.egg/starcluster/clustersetup.py",
line 351, in <lambda>
num_processors = sum(self.pool.map(lambda n: n.num_processors, nodes))
File "/usr/local/lib/python2.6/dist-packages/StarCluster-0.93.1-py2.6.egg/starcluster/node.py",
line 169, in num_processors
'cat /proc/cpuinfo | grep processor | wc -l')[0])
File "/usr/local/lib/python2.6/dist-packages/StarCluster-0.93.1-py2.6.egg/starcluster/ssh.py",
line 512, in execute
channel = self.transport.open_session()
File "/usr/local/lib/python2.6/dist-packages/StarCluster-0.93.1-py2.6.egg/starcluster/ssh.py",
line 129, in transport
port=self._port, timeout=self._timeout)
File "/usr/local/lib/python2.6/dist-packages/StarCluster-0.93.1-py2.6.egg/starcluster/ssh.py",
line 103, in connect
raise exception.SSHConnectionError(host, port)
SSHConnectionError: failed to connect to host
ec2-50-18-148-224.us-west-1.compute.amazonaws.com on port 22
!!! ERROR - Oops! Looks like you've found a bug in StarCluster
!!! ERROR - Crash report written to:
/home/ryu/.starcluster/logs/crash-report-14078.txt
!!! ERROR - Please remove any sensitive data from the crash report
!!! ERROR - and submit it to starcluster_at_mit.edu
--
Robert Yu, Member Technical Staff
www.aditazz.com | robert.yu_at_aditazz.com
1111 Bayhill Drive Suite 260 | San Bruno | CA 94066
510.459.0216 | cell
650.627.7357 | 650.492.7000 x1008 | work
650.684.1149 | fax
Received on Fri Feb 24 2012 - 13:03:15 EST