Re: Issue creating a cluster of 30 nodes with starcluster
This archive was generated by
I again tried creating 30 nodes cluster and figured out something new. I am
waiting for last 20 min for the cluster to be up.
I get the below message. Currently in EC2 all the nodes are up and running
,i don't know which node is taking time for SSH configuration.
so i am not able to restart or terminate a node.
>>> Using default cluster template: smallcluster
>>> Validating cluster template settings...
>>> Cluster template settings are valid
>>> Starting cluster...
>>> Launching a 30-node cluster...
>>> Creating security group _at_sc-smallcluster...
>>> Waiting for cluster to come up... (updating every 30s)
>>> Waiting for all nodes to be in a 'running' state...
>>> Waiting for SSH to come up on all nodes...
28/29 |------------------------------------------------------------- |
On Tue, Nov 8, 2011 at 7:42 PM, Justin Riley <jtriley_at_mit.edu> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> Hi Sumita,
> Were you using spot instances? If not I believe there's a default limit of
> 20 instances by default for flat-rate instances which *could* be related to
> your issue. With spot instances you can create up to 100 instances by
> default. So, if you need more than 20 nodes and do not wish to submit a
> request to Amazon to increase your flat-rate instance limit, you should be
> using spot instances:
> $ starcluster start -s 30 -b 0.50 mycluster
> With that said, StarCluster has no limit to the number of nodes you can
> create, however, as you've seen, sometimes EC2 instances can take longer to
> become 'running' than usual. Unfortunately this is purely an EC2 back-end
> issue that cannot be resolved directly by StarCluster. In my experience 22
> minutes *is* quite a while to wait for any instance to come up, however, I
> have had instances take up to 15 min before in the past so this is not a
> total surprise to me.
> In the future if you run into this problem of waiting for an instance to
> change from 'pending' to 'running' for too long (e.g. 15min+) I would
> recommend simply terminating the faulty instance from the AWS console and
> then restart the cluster using:
> $ starcluster restart mycluster
> This should reboot all the currently running instances and begin
> configuring the cluster and avoid having to terminate the entire cluster
> and lose instance hours.
> On 11/8/11 6:39 AM, Sumita Sinha wrote:
> > Hello ,
> > Currently working with starcluster on EC2.
> > Tried creating a cluster with 30 nodes of type m1.small using AMI -
> > Cluster creation was never completed as i found out that one node
> node025 was showing pending status.
> > I waited for almost 22 minutes then terminated the cluster.
> > Cluster was terminated properly. Is there any limit to the creation of
> nodes .
> > --
> > Regards
> > Sumita Sinha
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.11 (Darwin)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
> -----END PGP SIGNATURE-----
Received on Tue Nov 08 2011 - 19:20:33 EST