StarCluster - Mailing List Archive

Re: 100 nodes cluster

From: Paolo Di Tommaso <no email>
Date: Fri, 28 Oct 2011 18:16:03 +0200

The latest (0.92)

Cheers,
Paolo


On Oct 28, 2011, at 6:07 PM, Matthew Summers wrote:

> On Fri, Oct 28, 2011 at 10:44 AM, Paolo Di Tommaso
> <Paolo.DiTommaso_at_crg.eu> wrote:
>> Hi Gordon,
>> Starting a 100 nodes cluster it takes 30 minutes (and 1 hour with 200).
>> Using a EBS backed AMI the machines boot time is very short less than 1
>> minute and above all constant (does not increment increasing the number of
>> requested instances).
>> So all the time is spend in to configure the cluster.
>> StarCluster do a lot of tasks automatically (and for this reason I love
>> it!).
>> But saving the state for a configured cluster, another cluster instance
>> could be deployed updating only the /etc/hosts files and the SGE queue
>> configuration. This would reduce a lot the total amount of time required to
>> start.
>> Does it make sense ?
>>
>> Cheers,
>> Paolo
>>
>>
>>
>> On Oct 28, 2011, at 4:24 PM, Mark Gordon wrote:
>>
>> Hi Paolo:
>>
>> I wonder, what percentage of the launch time do you think is spend
>> configuring the nodes?
>>
>> cheers,
>> Mark
>>
>>
>> On Fri, Oct 28, 2011 at 4:57 AM, Paolo Di Tommaso <Paolo.DiTommaso_at_crg.eu>
>> wrote:
>>>
>>> Dear All,
>>>
>>> I'm still struggling with this problem with large cluster that requires so
>>> long time to be launched.
>>>
>>> I think that some improvements are possible having a better multithread
>>> handling, but I'm not a Python guru, so I cannot say about that in details.
>>>
>>> Anyway I'm looking for a more "radical" approach. My idea is to launch a
>>> 2-node cluster, save the master and slave nodes as two separate AMIs and use
>>> these to deploy a cluster of any size without having to install and
>>> configure everything from scratch (NFS, SGE, password less access, etc) but
>>> modifying only what is changed.
>>>
>>>
>>> So my questions is: which are the "delta" in the configuration files
>>> between two different cluster instances of X and Y nodes ?
>>>
>>> Knowing this it could be quite easy write a StarCluster plugin that will
>>> apply only these changes, achieving a much more faster launch time.
>>>
>>>
>>> Thank you,
>>>
>>> Paolo Di Tommaso
>>> Software Engineer
>>> Comparative Bioinformatics Group
>>> Centre de Regulacio Genomica (CRG)
>>> Dr. Aiguader, 88
>>> 08003 Barcelona, Spain
>>>
>>>
>>>
>>>
>>> On Oct 20, 2011, at 9:48 PM, Rayson Ho wrote:
>>>
>>>> ----- Original Message -----
>>>>> However, if one can wrap around the real
>>>> ssh with a fake ssh script that sleeps 30 seconds and then runs the
>>>> real
>>>>> ssh, then we can see how good (or bad) the Workerpool handles long
>>>> latency commands - and we will start from
>>>>> there to optimize the launch
>>>> performance.
>>>>
>>>> Replying to myself - after quickly reading the code...
>>>>
>>>> StarCluster uses Paramiko instead of executing ssh, so wrapping around a
>>>> long latency ssh script won't work.
>>>>
>>>> And there are quite a lot of discussions about issues with multithreaded
>>>> programs that call Paramiko -- just google: Paramiko+multithreading
>>>>
>>>>
>>>> Rayson
>>>>
>>>> =================================
>>>> Grid Engine / Open Grid Scheduler
>>>> http://gridscheduler.sourceforge.net
>>>> _______________________________________________
>>>> StarCluster mailing list
>>>> StarCluster_at_mit.edu
>>>> http://mailman.mit.edu/mailman/listinfo/starcluster
>>>
>>>
>>> _______________________________________________
>>> StarCluster mailing list
>>> StarCluster_at_mit.edu
>>> http://mailman.mit.edu/mailman/listinfo/starcluster
>>>
>>
>>
>>
>> --
>>
>> Mark Gordon
>>
>> Systems Analyst
>> Department of Physics
>> University of Alberta
>>
>> This communication is intended for the use of the recipient to which it is
>> addressed and may contain confidential, personal and/or privileged
>> information. Please contact us immediately if you are not the intended
>> recipient of this communication. If you are not the intended recipient of
>> this communication do not copy, distribute or take action on it. Any
>> communication received in error, or subsequent reply, should be deleted or
>> destroyed.
>>
>>
>> _______________________________________________
>> StarCluster mailing list
>> StarCluster_at_mit.edu
>> http://mailman.mit.edu/mailman/listinfo/starcluster
>>
>>
>
> What version of starcluster are you using, Paolo?
>
> --
> Matthew W. Summers
> Gentoo Foundation Inc.
Received on Fri Oct 28 2011 - 12:17:42 EDT
This archive was generated by hypermail 2.3.0.

Search:

Sort all by:

Date

Month

Thread

Author

Subject