This archive was generated by
On Fri, Oct 28, 2011 at 10:44 AM, Paolo Di Tommaso
> Hi Gordon,
> Starting a 100 nodes cluster it takes 30 minutes (and 1 hour with 200).
> Using a EBS backed AMI the machines boot time is very short less than 1
> minute and above all constant (does not increment increasing the number of
> requested instances).
> So all the time is spend in to configure the cluster.
> StarCluster do a lot of tasks automatically (and for this reason I love
> But saving the state for a configured cluster, another cluster instance
> could be deployed updating only the /etc/hosts files and the SGE queue
> configuration. This would reduce a lot the total amount of time required to
> Does it make sense ?
> On Oct 28, 2011, at 4:24 PM, Mark Gordon wrote:
> Hi Paolo:
> I wonder, what percentage of the launch time do you think is spend
> configuring the nodes?
> On Fri, Oct 28, 2011 at 4:57 AM, Paolo Di Tommaso <Paolo.DiTommaso_at_crg.eu>
>> Dear All,
>> I'm still struggling with this problem with large cluster that requires so
>> long time to be launched.
>> I think that some improvements are possible having a better multithread
>> handling, but I'm not a Python guru, so I cannot say about that in details.
>> Anyway I'm looking for a more "radical" approach. My idea is to launch a
>> 2-node cluster, save the master and slave nodes as two separate AMIs and use
>> these to deploy a cluster of any size without having to install and
>> configure everything from scratch (NFS, SGE, password less access, etc) but
>> modifying only what is changed.
>> So my questions is: which are the "delta" in the configuration files
>> between two different cluster instances of X and Y nodes ?
>> Knowing this it could be quite easy write a StarCluster plugin that will
>> apply only these changes, achieving a much more faster launch time.
>> Thank you,
>> Paolo Di Tommaso
>> Software Engineer
>> Comparative Bioinformatics Group
>> Centre de Regulacio Genomica (CRG)
>> Dr. Aiguader, 88
>> 08003 Barcelona, Spain
>> On Oct 20, 2011, at 9:48 PM, Rayson Ho wrote:
>> > ----- Original Message -----
>> >> However, if one can wrap around the real
>> > ssh with a fake ssh script that sleeps 30 seconds and then runs the
>> > real
>> >> ssh, then we can see how good (or bad) the Workerpool handles long
>> > latency commands - and we will start from
>> >> there to optimize the launch
>> > performance.
>> > Replying to myself - after quickly reading the code...
>> > StarCluster uses Paramiko instead of executing ssh, so wrapping around a
>> > long latency ssh script won't work.
>> > And there are quite a lot of discussions about issues with multithreaded
>> > programs that call Paramiko -- just google: Paramiko+multithreading
>> > Rayson
>> > =================================
>> > Grid Engine / Open Grid Scheduler
>> > http://gridscheduler.sourceforge.net
>> > _______________________________________________
>> > StarCluster mailing list
>> > StarCluster_at_mit.edu
>> > http://mailman.mit.edu/mailman/listinfo/starcluster
>> StarCluster mailing list
> Mark Gordon
> Systems Analyst
> Department of Physics
> University of Alberta
> This communication is intended for the use of the recipient to which it is
> addressed and may contain confidential, personal and/or privileged
> information. Please contact us immediately if you are not the intended
> recipient of this communication. If you are not the intended recipient of
> this communication do not copy, distribute or take action on it. Any
> communication received in error, or subsequent reply, should be deleted or
> StarCluster mailing list
What version of starcluster are you using, Paolo?
Matthew W. Summers
Gentoo Foundation Inc.
Received on Fri Oct 28 2011 - 12:07:37 EDT