Re: Issue creating a cluster of 30 nodes with starcluster
Hi Justin,
Thanks for your response.
1. Currently my account has an instance limit of 100.
2. I will with spot instance to check with the speed.
3. As per one of my query.
Does starcluster wait for all the nodes to be up and then it starts
configuring them all at one time.
Is there any parameter in the config file or any options in the starcluster
start command that says "configuration of the cluster and installing
SGE/Configuring NFS to be a parallel operation. any node should not wait
for the other nodes to be up for getiing configured that's if we post a job
on that ready node it should start executing the job with the available no
of nodes that are running and configured."
If the above is not possible , is there any specific reason while starting
a cluster, starcluster does the configuration of nodes only when all are
running.
Regards
Sumita
On Wed, Nov 9, 2011 at 12:15 PM, Justin Riley <jtriley_at_mit.edu> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Hi Sumita,
>
> Unless you've specifically submitted a request to Amazon to increase
> your instance limit[1] I can't help but suspect that you're likely
> running into issues because of the default 20 instance limit for
> flat-rate instances I mentioned earlier.
>
> I would recommend trying with spot instances[2]; they're usually
> cheaper than the flat-rate(s) AND you can launch up to 100 of them. To
> request a spot cluster just pass the --bid option to the start command:
>
> $ starcluster start --bid 0.50 mycluster
>
> This will place a $0.50 spot bid on each node in the cluster except
> for the master. The master node is always launched as a flat-rate
> instance for stability.
>
> To help you decide a decent spot bid use the spot history command:
>
> $ starcluster spothistory m1.large
>
> With that said you can check which nodes have SSH up using:
>
> $ starcluster listclusters --show-ssh-status
>
> Also, you can *always* restart and reboot all nodes in the cluster and
> completely reconfigure the cluster using the "restart" commmand:
>
> $ starcluster restart mycluster
>
> HTH,
>
> ~Justin
>
> [1] http://aws.amazon.com/contact-us/ec2-request/
> [2] http://aws.amazon.com/ec2/spot-instances/
>
> On 11/08/2011 07:20 PM, Sumita Sinha wrote:
> > Hi Justin,
> >
> > I again tried creating 30 nodes cluster and figured out something
> > new. I am waiting for last 20 min for the cluster to be up. I get
> > the below message. Currently in EC2 all the nodes are up and
> > running ,i don't know which node is taking time for SSH
> > configuration. so i am not able to restart or terminate a node.
> >
> >>>> Using default cluster template: smallcluster Validating
> >>>> cluster template settings... Cluster template settings are
> >>>> valid Starting cluster... Launching a 30-node cluster...
> >>>> Creating security group _at_sc-smallcluster...
> > Reservation:r-0e2d7060
> >>>> Waiting for cluster to come up... (updating every 30s)
> >>>> Waiting for all nodes to be in a 'running' state...
> > 29/29
> > ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
> > 100%
> >>>> Waiting for SSH to come up on all nodes...
> > 28/29
> > |------------------------------------------------------------- |
> > 96%
> >
> >
> > Regards Sumita
> >
> > On Tue, Nov 8, 2011 at 7:42 PM, Justin Riley <jtriley_at_mit.edu
> > <mailto:jtriley_at_mit.edu>> wrote:
> >
> >
> > Hi Sumita,
> >
> > Were you using spot instances? If not I believe there's a default
> > limit of 20 instances by default for flat-rate instances which
> > *could* be related to your issue. With spot instances you can
> > create up to 100 instances by default. So, if you need more than
> > 20 nodes and do not wish to submit a request to Amazon to increase
> > your flat-rate instance limit, you should be using spot instances:
> >
> > $ starcluster start -s 30 -b 0.50 mycluster
> >
> > With that said, StarCluster has no limit to the number of nodes you
> > can create, however, as you've seen, sometimes EC2 instances can
> > take longer to become 'running' than usual. Unfortunately this is
> > purely an EC2 back-end issue that cannot be resolved directly by
> > StarCluster. In my experience 22 minutes *is* quite a while to wait
> > for any instance to come up, however, I have had instances take up
> > to 15 min before in the past so this is not a total surprise to
> > me.
> >
> > In the future if you run into this problem of waiting for an
> > instance to change from 'pending' to 'running' for too long (e.g.
> > 15min+) I would recommend simply terminating the faulty instance
> > from the AWS console and then restart the cluster using:
> >
> > $ starcluster restart mycluster
> >
> > This should reboot all the currently running instances and begin
> > configuring the cluster and avoid having to terminate the entire
> > cluster and lose instance hours.
> >
> > HTH,
> >
> > ~Justin
> >
> >
> > On 11/8/11 6:39 AM, Sumita Sinha wrote:
> >> Hello ,
> >
> >> Currently working with starcluster on EC2.
> >
> >> Tried creating a cluster with 30 nodes of type m1.small using
> >> AMI -
> > ami-8cf913e5.
> >> Cluster creation was never completed as i found out that one
> >> node
> > node025 was showing pending status.
> >> I waited for almost 22 minutes then terminated the cluster.
> >> Cluster was terminated properly. Is there any limit to the
> >> creation
> > of nodes .
> >
> >
> >
> >
> >> -- Regards Sumita Sinha
> >
> >
> >
> >
> >
> >
> >
> > -- Regards Sumita Sinha
> >
> >
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v2.0.17 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
>
> iEYEARECAAYFAk66IZYACgkQ4llAkMfDcrmZ5ACeIPTP8ZiFKTlTNxif6SgIKsWm
> SmoAnA08GWFcOcmpCF+MMHwLzhqzD0Va
> =KFye
> -----END PGP SIGNATURE-----
>
--
Regards
Sumita Sinha
Received on Wed Nov 09 2011 - 02:30:41 EST
This archive was generated by
hypermail 2.3.0.