StarCluster - Mailing List Archive

issues with adding multiple nodes to a running cluster

From: Wei Tao <no email>
Date: Tue, 3 Jan 2012 15:53:39 -0500

Hi all,

>From time to time, when I tried to add nodes to a running starcluster using
either the loadbalance or addnodes, starcluster would miss fire. For
example, I set "-a 5" in loadbalance,

command:
  starcluster loadbalance -m 20 -a 5 -n 1 <mycluster>

here is what I got:

>>> Loading full job history
Cluster size: 10
Queued jobs: 361
Oldest queued job: 2012-01-03 20:13:56
Avg job duration: 256 secs
Avg job wait time: 167 secs
Last cluster modification time: 2012-01-03 20:17:07
>>> A job has been waiting for 963 sec, longer than max 900
>>> *** ADDING 5 NODES at 2012-01-03 20:29:59.623917
>>> Launching node(s): node010, node011, node012, node013, node014
SpotInstanceRequest:sir-29586e14
SpotInstanceRequest:sir-46e90414
SpotInstanceRequest:sir-314a9814
SpotInstanceRequest:sir-99387e14
SpotInstanceRequest:sir-9ad72a14
SpotInstanceRequest:sir-089dcc11
SpotInstanceRequest:sir-09d28011
SpotInstanceRequest:sir-64d4dc11
SpotInstanceRequest:sir-45516411
SpotInstanceRequest:sir-f2b31a11
SpotInstanceRequest:sir-0198f214
SpotInstanceRequest:sir-1db0a014
SpotInstanceRequest:sir-49c97814
SpotInstanceRequest:sir-94fdd414
SpotInstanceRequest:sir-69db0014
SpotInstanceRequest:sir-6f410612
SpotInstanceRequest:sir-93c1c012
SpotInstanceRequest:sir-e44c7c12
SpotInstanceRequest:sir-dbc51012
SpotInstanceRequest:sir-aa52dc12
SpotInstanceRequest:sir-9f9e6811
SpotInstanceRequest:sir-50053011
SpotInstanceRequest:sir-33455211
SpotInstanceRequest:sir-ffcdd011
SpotInstanceRequest:sir-c1d7ee11
>>> Waiting for node(s) to come up... (updating every 30s)
>>> Waiting for open spot requests to become active...
34/34 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
100%
>>> Waiting for all nodes to be in a 'running' state...
35/35 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
100%
>>> Waiting for SSH to come up on all nodes...
^C/35 ||||||||||||||||||||||||||||||||||||||||||||||||||||||| |
 85%

Instead of 5 nodes, 25 nodes were fired up. Did anyone experience similar
issue? Is this a bug in the code or I miss something in my command?

Thanks!



-- 
Wei Tao, Ph.D.
TSI Biocomputing LLC
617-564-0934
Received on Tue Jan 03 2012 - 15:53:40 EST
This archive was generated by hypermail 2.3.0.

Search:

Sort all by:

Date

Month

Thread

Author

Subject