StarCluster - Mailing List Archive

Crash report

From: Daniel Povey <no email>
Date: Thu, 21 Feb 2013 12:59:29 -0500

I attach a crash report.
I think this may be an error in mapping a node name to an internet name.
The host ec2-23-22-72-123.compute-1.amazonaws.com was not actually node004
which I was trying to remove, it was node003.
Do you think the github version will be better than the released version at
the moment? I do have the latest release.
Dan


>>> Removing node004 from SGE
!!! ERROR - command 'source /etc/profile && qconf -de node004' failed with
status 1
!!! ERROR - command 'pkill -9 sge_execd' failed with status 1
>>> Updating SGE parallel environment 'orte'
4/4 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
100%
error occurred in job (id=139906233857792): failed to connect to host
ec2-23-22-72-123.compute-1.amazonaws.com on port 22
Traceback (most recent call last):
  File
"/opt/lib/python2.6/site-packages/StarCluster-0.93.3-py2.6.egg/starcluster/threadpool.py",
line 31, in run
    job.run()
  File
"/opt/lib/python2.6/site-packages/StarCluster-0.93.3-py2.6.egg/starcluster/threadpool.py",
line 58, in run
    r = self.method(*self.args, **self.kwargs)
  File
"/opt/lib/python2.6/site-packages/StarCluster-0.93.3-py2.6.egg/starcluster/plugins/sge.py",
line 50, in <lambda>
    num_processors = sum(self.pool.map(lambda n: n.num_processors, nodes))
  File
"/opt/lib/python2.6/site-packages/StarCluster-0.93.3-py2.6.egg/starcluster/node.py",
line 169, in num_processors
    'cat /proc/cpuinfo | grep processor | wc -l')[0])
  File
"/opt/lib/python2.6/site-packages/StarCluster-0.93.3-py2.6.egg/starcluster/sshutils/__init__.py",
line 519, in execute
    channel = self.transport.open_session()
  File
"/opt/lib/python2.6/site-packages/StarCluster-0.93.3-py2.6.egg/starcluster/sshutils/__init__.py",
line 136, in transport
    port=self._port, timeout=self._timeout)
  File
"/opt/lib/python2.6/site-packages/StarCluster-0.93.3-py2.6.egg/starcluster/sshutils/__init__.py",
line 103, in connect
    raise exception.SSHConnectionError(host, port)
SSHConnectionError: failed to connect to host
ec2-23-22-72-123.compute-1.amazonaws.com on port 22



Received on Thu Feb 21 2013 - 12:59:30 EST
This archive was generated by hypermail 2.3.0.

Search:

Sort all by:

Date

Month

Thread

Author

Subject