StarCluster - Mailing List Archive

Re: 100 nodes cluster

From: Matthew Summers <no email>
Date: Mon, 17 Oct 2011 09:58:58 -0500

On Mon, Oct 17, 2011 at 4:48 AM, Luis M. Carril <lmcarril_at_cesga.es> wrote:
> Hi,
>     Although I´ve never tested a deployment so big, I´ve had a lot of
> problems with 10-20 node deployments. Always one machine or two hangs
> booting or deploying, which is pretty annoying; so I can´t have the
> cluster deployment completely automatized because I have to watch it to
> stop or boot the failing nodes.
>
> Best regards
> Luis M Carril
>
> El 14/10/2011 16:46, Paolo Di Tommaso escribió:
>> Hi All,
>>
>> I've tried to setup a cluster with 100 nodes with quite powerful machines (Hi-Mem double extra large configuration) but it ended in a total failure.
>>
>> The overall configuration process was extremely slow. Five instances blocked in pending state for more than 10 minutes so I had to terminate them manually .
>>
>> Also other machines returns some error codes, for example mounting the /home and other SGE components.
>>
>> I had to stop the initialization phase manually after more than 30 minutes, because it seem to hung.
>>
>>
>> I'm not blaming about StarCluster, it is really a nice piece of software. The problem really seems to be the Amazon infrastructure that has lot of latencies and unreliable behaviors.
>>
>>
>> What is your opinion about that? Is there anyone running successfully a "big" cluster using the StarCluster tool?
>>
>>
>>
>>
>> Thank you,
>>
>> Paolo Di Tommaso
>> Software Engineer
>> Comparative Bioinformatics Group
>> Centre de Regulacio Genomica (CRG)
>> Dr. Aiguader, 88
>> 08003 Barcelona, Spain
>>
>>
>> _______________________________________________
>> StarCluster mailing list
>> StarCluster_at_mit.edu
>> http://mailman.mit.edu/mailman/listinfo/starcluster
>
> --
> Luis M. Carril
> Project Technician
> Galicia Supercomputing Center (CESGA)
> Avda. de Vigo s/n
> 15706 Santiago de Compostela
> SPAIN
>
> Tel: 34-981569810 ext 249
> lmcarril_at_cesga.es
> www.cesga.es
>
>
> ==================================================================
>
> _______________________________________________
> StarCluster mailing list
> StarCluster_at_mit.edu
> http://mailman.mit.edu/mailman/listinfo/starcluster
>

Are you guys running a versioned release or the HEAD on git. I am more
than fairly certain this has been optimized in the repo, iirc a few
months ago.

-- 
Matthew W. Summers
Gentoo Foundation Inc.
Received on Mon Oct 17 2011 - 10:59:00 EDT
This archive was generated by hypermail 2.3.0.

Search:

Sort all by:

Date

Month

Thread

Author

Subject