StarCluster - Mailing List Archive

Re: 100 nodes cluster

From: Mark Gordon <no email>
Date: Thu, 20 Oct 2011 08:31:31 -0600

Hey Paolo:

Just throwing this out there, as a possibility: the xargs command when used
with the P option could be used to launch all the worker nodes at once (I am
sure there is some limit in each case as to the maximum number of concurrent
conversations a single machine could drive). See this blog
post<http://blogs.oracle.com/cwalsh/entry/the_power_of_xargs>.
Python's GIL <http://docs.python.org/c-api/init.html#threads> prevents true
multi-threading in the language, so something like xarg is needed.

Using xargs would require a change to StarCluster. The source is available
at GitHub <https://github.com/jtriley/StarCluster>..

Thanks to Luke Tymowski at Cybera for the tip re xargs.

cheers,
Mark

Systems Analyst
Department of Physics
University of Alberta


On Thu, Oct 20, 2011 at 5:32 AM, Paolo Di Tommaso <Paolo.DiTommaso_at_crg.eu>wrote:

> Dear all,
>
> Thank you for your feedback, it has been very useful. The new StarCluster
> release 0.92 solves most of the problems.
>
> It is much more stable, and node I don't get any error launching large
> clusters (with 100 or more instances).
>
> Anyway the overall process is still very slow and, above all, the time
> required seems to be linear with the number of the instances used.
>
> For examples:
>
> - Launching 100 nodes, the configuration requires ~ 30 minutes to complete;
> - Launching 200 nodes, it requires ~ 1 hour;
>
> Since our target is launching such as number of nodes to run jobs that may
> require around 1 hour to be completed, it would be meaningless to spend 50%
> or more of the time only to configure the system. The addnode command does
> not help because this process is even longer, since for each added node
> StarCluster need to update the /etc/hosts for each node.
>
>
> So the question is: would not be possible to use pre-configured node
> images, to shorten as much as possible to configuration steps (ideally only
> to the "/etc/hosts" files and the SGE updating) ?
>
>
> I'm thinking something similar to:
>
> 1) Launch a 2-node configuration.
> 2) Save the master and the node instances as two new separate AMI images.
> 3) Use these images as pre-configured machines to deploy a large cluster,
> updating the "hosts" files (and whatever else is needed).
>
> This would avoid to configure all the nodes from scratch and reduce the
> overall star-up time.
>
>
> Does it make sense? Is it possible in some way? Maybe using a custom plugin
> ?
>
>
> Cheers,
>
> Paolo Di Tommaso
> Software Engineer
> Comparative Bioinformatics Group
> Centre de Regulacio Genomica (CRG)
> Dr. Aiguader, 88
> 08003 Barcelona, Spain
>
>
>
>
>
>
> On Oct 17, 2011, at 5:59 PM, Rayson Ho wrote:
>
> 1) I agree with Matt, also a 20-node cluster should be relatively error
> free to bootstrap.
>
>
> 2) EC2 occasionally fails to start a node or 2 when requested to start a
> large number of nodes (instances), and I believe it has to do with how busy
> it is handling other requests as well. The best way to not overload EC2 is
> to start a few nodes at a time rather than the whole cluster all at once.
>
> In 0.92rc2, there is the addnode command:
>
> $ starcluster addnode mynewcluster
>
> The latest trunk introduces the ability to add multiple nodes, e.g. 3
> nodes:
>
> $ starcluster addnode -n 3 mycluster
>
> So instead of starting a 100-node cluster during start-up, try starting a
> 20 or 30-node one first, and then grow the cluster. For 0.92rc2, you may
> want to script the addnode command unless you enjoy typing :-D
>
>
> 3) I will do more scalability testing and hope to contribute scalability
> related improvements to StarCluster in the near future. I am waiting for the
> EBS based AMI so that I can start a large number of instances without
> breaking the bank - I am going to use my own AWS account, so I am interested
> in minimizing cost by using t1.micro (which is slower when running real
> work, but I am interesting in the launch speed of EC2 itself, so t1.micro
> seems to be perfect for my need!).
>
> https://github.com/jtriley/StarCluster/issues/52
> http://mailman.mit.edu/pipermail/starcluster/2011-October/000818.html
>
> (To Justin: no pressure in getting the EBS AMI, I will be busy till mid
> Nov).
>
> Rayson
>
> =================================
> Grid Engine / Open Grid Scheduler
> http://gridscheduler.sourceforge.net
>
>
> ------------------------------
> *From:* Matthew Summers <quantumsummers_at_gentoo.org>
> *To:* "starcluster_at_mit.edu" <starcluster_at_mit.edu>
> *Sent:* Monday, October 17, 2011 10:58 AM
> *Subject:* Re: [StarCluster] 100 nodes cluster
>
> Are you guys running a versioned release or the HEAD on git. I am more
> than fairly certain this has been optimized in the repo, iirc a few
> months ago.
>
> --
> Matthew W. Summers
> Gentoo Foundation Inc.
>
> _______________________________________________
> StarCluster mailing list
> StarCluster_at_mit.edu
> http://mailman.mit.edu/mailman/listinfo/starcluster
> Barcelona, Spain
> >>
> >>
> >> _______________________________________________
> >> StarCluster mailing list
> >> StarCluster_at_mit.edu
> >> http://mailman.mit.edu/mailman/listinfo/starcluster
> >
> > --
> > Luis M. Carril
> > Project Technician
> > Galicia Supercomputing Center (CESGA)
> > Avda. de Vigo s/n
> > 15706 Santiago de Compostela
> > SPAIN
> >
> > Tel: 34-981569810 ext 249
> > lmcarril_at_cesga.es
> > www.cesga.es
> >
> >
> > ==================================================================
> >
> > _______________________________________________
> > StarCluster mailing list
> > StarCluster_at_mit.edu
> > http://mailman.mit.edu/mailman/listinfo/starcluster
> >
>
> Are you guys running a versioned release or the HEAD on git. I am more
> than fairly certain this has been optimized in the repo, iirc a few
> months ago.
>
> --
> Matthew W. Summers
> Gentoo Foundation Inc.
>
> _______________________________________________
> StarCluster mailing list
> StarCluster_at_mit.edu
> http://mailman.mit.edu/mailman/listinfo/starcluster
>
>
> _______________________________________________
> StarCluster mailing list
> StarCluster_at_mit.edu
> http://mailman.mit.edu/mailman/listinfo/starcluster
>
>
>
> _______________________________________________
> StarCluster mailing list
> StarCluster_at_mit.edu
> http://mailman.mit.edu/mailman/listinfo/starcluster
>
>
Received on Thu Oct 20 2011 - 10:31:32 EDT
This archive was generated by hypermail 2.3.0.

Search:

Sort all by:

Date

Month

Thread

Author

Subject