1) I agree with Matt, also a 20-node cluster should be relatively error free to bootstrap.
2) EC2 occasionally fails to start a node or 2 when requested to start a large number of nodes (instances), and I believe it has to do with how busy it is handling other requests as well. The best way to not overload EC2 is to start a few nodes at a time rather than the whole cluster all at once.
In 0.92rc2, there is the addnode command:
$ starcluster addnode mynewcluster
The latest trunk introduces the ability to add multiple nodes, e.g. 3 nodes:
$ starcluster addnode -n 3 mycluster
So instead of starting a 100-node cluster during start-up, try starting a 20 or 30-node one first, and then grow the cluster. For 0.92rc2, you may want to script the addnode command unless you enjoy typing :-D
3) I will do more scalability testing and hope to contribute scalability related improvements to StarCluster in the near future. I am waiting for the EBS based AMI so that I can start a large number of instances without breaking the bank - I am going to use my own AWS account, so I am interested in minimizing cost by using t1.micro (which is slower when running real work, but I am interesting in the launch speed of EC2 itself, so t1.micro seems to be perfect for my need!).
https://github.com/jtriley/StarCluster/issues/52
http://mailman.mit.edu/pipermail/starcluster/2011-October/000818.html
(To Justin: no pressure in getting the EBS AMI, I will be busy till mid Nov).
Rayson
=================================
Grid Engine / Open Grid Scheduler
http://gridscheduler.sourceforge.net
________________________________
From: Matthew Summers <quantumsummers_at_gentoo.org>
To: "starcluster_at_mit.edu" <starcluster_at_mit.edu>
Sent: Monday, October 17, 2011 10:58 AM
Subject: Re: [StarCluster] 100 nodes cluster
Are you guys running a versioned release or the HEAD on git. I am more
than fairly certain this has been optimized in the repo, iirc a few
months ago.
--
Matthew W. Summers
Gentoo Foundation Inc.
_______________________________________________
StarCluster mailing list
StarCluster_at_mit.edu
http://mailman.mit.edu/mailman/listinfo/starcluster
Barcelona, Spain
>>
>>
>> _______________________________________________
>> StarCluster mailing list
>> StarCluster_at_mit.edu
>> http://mailman.mit.edu/mailman/listinfo/starcluster
>
> --
> Luis M. Carril
> Project Technician
> Galicia Supercomputing Center (CESGA)
> Avda. de Vigo s/n
> 15706 Santiago de Compostela
> SPAIN
>
> Tel: 34-981569810 ext 249
> lmcarril_at_cesga.es
> www.cesga.es
>
>
> ==================================================================
>
> _______________________________________________
> StarCluster mailing list
> StarCluster_at_mit.edu
> http://mailman.mit.edu/mailman/listinfo/starcluster
>
Are you guys running a versioned release or the HEAD on git. I am more
than fairly certain this has been optimized in the repo, iirc a few
months ago.
--
Matthew W. Summers
Gentoo Foundation Inc.
_______________________________________________
StarCluster mailing list
StarCluster_at_mit.edu
http://mailman.mit.edu/mailman/listinfo/starcluster
Received on Mon Oct 17 2011 - 11:59:57 EDT