Problem adding spot request nodes when using load balancer
Here is my load balancer output
>>> Plotting stats to directory: /home/ubuntu/.starcluster/sge/m-1
>>> Loading full job history
Execution hosts: 1
Queued jobs: 29
Oldest queued job: 2012-09-18 21:39:16
Avg job duration: 600 secs
Avg job wait time: 1501 secs
Last cluster modification time: 2012-09-18 22:46:52
>>> A job has been waiting for 4060 sec, longer than max 900
*** WARNING - Adding 1 nodes at 2012-09-18 22:46:56.804876
>>> Launching node(s): node003
SpotInstanceRequest:sir-450d5211
>>> Waiting for node(s) to come up... (updating every 15s)
>>> Waiting for all nodes to be in a 'running' state...
3/3 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
100%
>>> Waiting for SSH to come up on all nodes...
3/3 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
100%
>>> Waiting for cluster to come up took 0.039 mins
!!! ERROR - Failed to add new host
Traceback (most recent call last):
File
"/home/ubuntu/maxwell_control/local/lib/python2.7/site-packages/StarCluster-0.9999-py2.7.egg/starcluster/balancers/sge/__init__.py",
line 675, in _eval_add_node
self._cm.add_nodes(self._cluster.cluster_tag, need_to_add)
File
"/home/ubuntu/maxwell_control/local/lib/python2.7/site-packages/StarCluster-0.9999-py2.7.egg/starcluster/cluster.py",
line 160, in add_nodes
no_create=no_create)
File
"/home/ubuntu/maxwell_control/local/lib/python2.7/site-packages/StarCluster-0.9999-py2.7.egg/starcluster/cluster.py",
line 823, in add_nodes
node = self.get_node_by_alias(alias)
File
"/home/ubuntu/maxwell_control/local/lib/python2.7/site-packages/StarCluster-0.9999-py2.7.egg/starcluster/cluster.py",
line 689, in get_node_by_alias
raise exception.InstanceDoesNotExist(alias, label='node')
InstanceDoesNotExist: node 'node003' does not exist
>>> Done making graphs.
>>> Sleeping...(looping again in 30 secs)
Any ideas? Thanks in advance,
Jesse
Received on Tue Sep 18 2012 - 19:09:27 EDT
This archive was generated by
hypermail 2.3.0.