Re: 100 nodes cluster

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

From: Matthew Summers <no email>
Date: Fri, 28 Oct 2011 11:07:36 -0500

On Fri, Oct 28, 2011 at 10:44 AM, Paolo Di Tommaso
<Paolo.DiTommaso_at_crg.eu> wrote:
> Hi Gordon,
> Starting a 100 nodes cluster it takes 30 minutes (and 1 hour with 200).
> Using a EBS backed AMI the machines boot time is very short less than 1
> minute and above all constant (does not increment increasing the number of
> requested instances).
> So all the time is spend in to configure the cluster.
> StarCluster do a lot of tasks automatically (and for this reason I love
> it!).
> But saving the state for a configured cluster, another cluster instance
> could be deployed updating only the /etc/hosts files and the SGE queue
> configuration. This would reduce a lot the total amount of time required to
> start.
> Does it make sense ?
>
> Cheers,
> Paolo
>
>
>
> On Oct 28, 2011, at 4:24 PM, Mark Gordon wrote:
>
> Hi Paolo:
>
> I wonder, what percentage of the launch time do you think is spend
> configuring the nodes?
>
> cheers,
> Mark
>
>
> On Fri, Oct 28, 2011 at 4:57 AM, Paolo Di Tommaso <Paolo.DiTommaso_at_crg.eu>
> wrote:
>>
>> Dear All,
>>
>> I'm still struggling with this problem with large cluster that requires so
>> long time to be launched.
>>
>> I think that some improvements are possible having a better multithread
>> handling, but I'm not a Python guru, so I cannot say about that in details.
>>
>> Anyway I'm looking for a more "radical" approach. My idea is to launch a
>> 2-node cluster, save the master and slave nodes as two separate AMIs and use
>> these to deploy a cluster of any size without having to install and
>> configure everything from scratch (NFS, SGE, password less access, etc) but
>> modifying only what is changed.
>>
>>
>> So my questions is: which are the "delta" in the configuration files
>> between two different cluster instances of X and Y nodes ?
>>
>> Knowing this it could be quite easy write a StarCluster plugin that will
>> apply only these changes, achieving a much more faster launch time.
>>
>>
>> Thank you,
>>
>> Paolo Di Tommaso
>> Software Engineer
>> Comparative Bioinformatics Group
>> Centre de Regulacio Genomica (CRG)
>> Dr. Aiguader, 88
>> 08003 Barcelona, Spain
>>
>>
>>
>>
>> On Oct 20, 2011, at 9:48 PM, Rayson Ho wrote:
>>
>> > ----- Original Message -----
>> >> However, if one can wrap around the real
>> > ssh with a fake ssh script that sleeps 30 seconds and then runs the
>> > real
>> >> ssh, then we can see how good (or bad) the Workerpool handles long
>> > latency commands - and we will start from
>> >> there to optimize the launch
>> > performance.
>> >
>> > Replying to myself - after quickly reading the code...
>> >
>> > StarCluster uses Paramiko instead of executing ssh, so wrapping around a
>> > long latency ssh script won't work.
>> >
>> > And there are quite a lot of discussions about issues with multithreaded
>> > programs that call Paramiko -- just google: Paramiko+multithreading
>> >
>> >
>> > Rayson
>> >
>> > =================================
>> > Grid Engine / Open Grid Scheduler
>> > http://gridscheduler.sourceforge.net
>> > _______________________________________________
>> > StarCluster mailing list
>> > StarCluster_at_mit.edu
>> > http://mailman.mit.edu/mailman/listinfo/starcluster
>>
>>
>> _______________________________________________
>> StarCluster mailing list
>> StarCluster_at_mit.edu
>> http://mailman.mit.edu/mailman/listinfo/starcluster
>>
>
>
>
> --
>
> Mark Gordon
>
> Systems Analyst
> Department of Physics
> University of Alberta
>
> This communication is intended for the use of the recipient to which it is
> addressed and may contain confidential, personal and/or privileged
> information. Please contact us immediately if you are not the intended
> recipient of this communication. If you are not the intended recipient of
> this communication do not copy, distribute or take action on it. Any
> communication received in error, or subsequent reply, should be deleted or
> destroyed.
>
>
> _______________________________________________
> StarCluster mailing list
> StarCluster_at_mit.edu
> http://mailman.mit.edu/mailman/listinfo/starcluster
>
>

What version of starcluster are you using, Paolo?

-- 
Matthew W. Summers
Gentoo Foundation Inc.

Received on Fri Oct 28 2011 - 12:07:37 EDT

This message: [ Message body ]
Next message: Paolo Di Tommaso: "Re: 100 nodes cluster"
Previous message: Paolo Di Tommaso: "Re: 100 nodes cluster"
In reply to: Paolo Di Tommaso: "Re: 100 nodes cluster"
Next in thread: Paolo Di Tommaso: "Re: 100 nodes cluster"
Reply: Paolo Di Tommaso: "Re: 100 nodes cluster"
Reply: Rayson Ho: "Re: 100 nodes cluster"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

This archive was generated by hypermail 2.3.0.

Search:

Sort all by:

Navigation

Re: 100 nodes cluster

Search:

Sort all by:

Navigation