StarCluster - Mailing List Archive

Re: issues with adding multiple nodes to a running cluster

From: Justin Riley <no email>
Date: Tue, 03 Jan 2012 15:55:34 -0500

Uggh, this is totally a bug, another user reported on the github issue
tracker and the issue has been fixed on github. I'm releasing 0.93
today (skipping 0.92.2 version given the amount of new stuff in this
release) which should fix this.

Will send an announcement once it's released. Stay tuned....

~Justin



On Tue 03 Jan 2012 03:53:39 PM EST, Wei Tao wrote:
> Hi all,
>
> From time to time, when I tried to add nodes to a running starcluster
> using either the loadbalance or addnodes, starcluster would miss fire.
> For example, I set "-a 5" in loadbalance,
>
> command:
> starcluster loadbalance -m 20 -a 5 -n 1 <mycluster>
>
> here is what I got:
>
> >>> Loading full job history
> Cluster size: 10
> Queued jobs: 361
> Oldest queued job: 2012-01-03 20 <tel:2012-01-03%2020>:13:56
> Avg job duration: 256 secs
> Avg job wait time: 167 secs
> Last cluster modification time: 2012-01-03 20 <tel:2012-01-03%2020>:17:07
> >>> A job has been waiting for 963 sec, longer than max 900
> >>> *** ADDING 5 NODES at 2012-01-03 20 <tel:2012-01-03%2020>:29:59.623917
> >>> Launching node(s): node010, node011, node012, node013, node014
> SpotInstanceRequest:sir-29586e14
> SpotInstanceRequest:sir-46e90414
> SpotInstanceRequest:sir-314a9814
> SpotInstanceRequest:sir-99387e14
> SpotInstanceRequest:sir-9ad72a14
> SpotInstanceRequest:sir-089dcc11
> SpotInstanceRequest:sir-09d28011
> SpotInstanceRequest:sir-64d4dc11
> SpotInstanceRequest:sir-45516411
> SpotInstanceRequest:sir-f2b31a11
> SpotInstanceRequest:sir-0198f214
> SpotInstanceRequest:sir-1db0a014
> SpotInstanceRequest:sir-49c97814
> SpotInstanceRequest:sir-94fdd414
> SpotInstanceRequest:sir-69db0014
> SpotInstanceRequest:sir-6f410612
> SpotInstanceRequest:sir-93c1c012
> SpotInstanceRequest:sir-e44c7c12
> SpotInstanceRequest:sir-dbc51012
> SpotInstanceRequest:sir-aa52dc12
> SpotInstanceRequest:sir-9f9e6811
> SpotInstanceRequest:sir-50053011
> SpotInstanceRequest:sir-33455211
> SpotInstanceRequest:sir-ffcdd011
> SpotInstanceRequest:sir-c1d7ee11
> >>> Waiting for node(s) to come up... (updating every 30s)
> >>> Waiting for open spot requests to become active...
> 34/34
> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
> >>> Waiting for all nodes to be in a 'running' state...
> 35/35
> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
> >>> Waiting for SSH to come up on all nodes...
> ^C/35 |||||||||||||||||||||||||||||||||||||||||||||||||||||||
> | 85%
>
> Instead of 5 nodes, 25 nodes were fired up. Did anyone experience
> similar issue? Is this a bug in the code or I miss something in my
> command?
>
> Thanks!
>
>
>
> --
> Wei Tao, Ph.D.
> TSI Biocomputing LLC
> 617-564-0934 <tel:617-564-0934>
>
Received on Tue Jan 03 2012 - 15:55:43 EST
This archive was generated by hypermail 2.3.0.

Search:

Sort all by:

Date

Month

Thread

Author

Subject