StarCluster - Mailing List Archive

Re: Issue creating a cluster of 30 nodes with starcluster

From: Justin Riley <no email>
Date: Wed, 09 Nov 2011 01:45:42 -0500

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi Sumita,

Unless you've specifically submitted a request to Amazon to increase
your instance limit[1] I can't help but suspect that you're likely
running into issues because of the default 20 instance limit for
flat-rate instances I mentioned earlier.

I would recommend trying with spot instances[2]; they're usually
cheaper than the flat-rate(s) AND you can launch up to 100 of them. To
request a spot cluster just pass the --bid option to the start command:

$ starcluster start --bid 0.50 mycluster

This will place a $0.50 spot bid on each node in the cluster except
for the master. The master node is always launched as a flat-rate
instance for stability.

To help you decide a decent spot bid use the spot history command:

$ starcluster spothistory m1.large

With that said you can check which nodes have SSH up using:

$ starcluster listclusters --show-ssh-status

Also, you can *always* restart and reboot all nodes in the cluster and
completely reconfigure the cluster using the "restart" commmand:

$ starcluster restart mycluster

HTH,

~Justin

[1] http://aws.amazon.com/contact-us/ec2-request/
[2] http://aws.amazon.com/ec2/spot-instances/

On 11/08/2011 07:20 PM, Sumita Sinha wrote:
> Hi Justin,
>
> I again tried creating 30 nodes cluster and figured out something
> new. I am waiting for last 20 min for the cluster to be up. I get
> the below message. Currently in EC2 all the nodes are up and
> running ,i don't know which node is taking time for SSH
> configuration. so i am not able to restart or terminate a node.
>
>>>> Using default cluster template: smallcluster Validating
>>>> cluster template settings... Cluster template settings are
>>>> valid Starting cluster... Launching a 30-node cluster...
>>>> Creating security group _at_sc-smallcluster...
> Reservation:r-0e2d7060
>>>> Waiting for cluster to come up... (updating every 30s)
>>>> Waiting for all nodes to be in a 'running' state...
> 29/29
> ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
> 100%
>>>> Waiting for SSH to come up on all nodes...
> 28/29
> |------------------------------------------------------------- |
> 96%
>
>
> Regards Sumita
>
> On Tue, Nov 8, 2011 at 7:42 PM, Justin Riley <jtriley_at_mit.edu
> <mailto:jtriley_at_mit.edu>> wrote:
>
>
> Hi Sumita,
>
> Were you using spot instances? If not I believe there's a default
> limit of 20 instances by default for flat-rate instances which
> *could* be related to your issue. With spot instances you can
> create up to 100 instances by default. So, if you need more than
> 20 nodes and do not wish to submit a request to Amazon to increase
> your flat-rate instance limit, you should be using spot instances:
>
> $ starcluster start -s 30 -b 0.50 mycluster
>
> With that said, StarCluster has no limit to the number of nodes you
> can create, however, as you've seen, sometimes EC2 instances can
> take longer to become 'running' than usual. Unfortunately this is
> purely an EC2 back-end issue that cannot be resolved directly by
> StarCluster. In my experience 22 minutes *is* quite a while to wait
> for any instance to come up, however, I have had instances take up
> to 15 min before in the past so this is not a total surprise to
> me.
>
> In the future if you run into this problem of waiting for an
> instance to change from 'pending' to 'running' for too long (e.g.
> 15min+) I would recommend simply terminating the faulty instance
> from the AWS console and then restart the cluster using:
>
> $ starcluster restart mycluster
>
> This should reboot all the currently running instances and begin
> configuring the cluster and avoid having to terminate the entire
> cluster and lose instance hours.
>
> HTH,
>
> ~Justin
>
>
> On 11/8/11 6:39 AM, Sumita Sinha wrote:
>> Hello ,
>
>> Currently working with starcluster on EC2.
>
>> Tried creating a cluster with 30 nodes of type m1.small using
>> AMI -
> ami-8cf913e5.
>> Cluster creation was never completed as i found out that one
>> node
> node025 was showing pending status.
>> I waited for almost 22 minutes then terminated the cluster.
>> Cluster was terminated properly. Is there any limit to the
>> creation
> of nodes .
>
>
>
>
>> -- Regards Sumita Sinha
>
>
>
>
>
>
>
> -- Regards Sumita Sinha
>
>

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk66IZYACgkQ4llAkMfDcrmZ5ACeIPTP8ZiFKTlTNxif6SgIKsWm
SmoAnA08GWFcOcmpCF+MMHwLzhqzD0Va
=KFye
-----END PGP SIGNATURE-----
Received on Wed Nov 09 2011 - 01:45:44 EST
This archive was generated by hypermail 2.3.0.

Search:

Sort all by:

Date

Month

Thread

Author

Subject