StarCluster - Mailing List Archive

Re: 100 nodes cluster

From: Luis M. Carril <no email>
Date: Mon, 17 Oct 2011 11:48:23 +0200

Hi,
     Although I´ve never tested a deployment so big, I´ve had a lot of
problems with 10-20 node deployments. Always one machine or two hangs
booting or deploying, which is pretty annoying; so I can´t have the
cluster deployment completely automatized because I have to watch it to
stop or boot the failing nodes.

Best regards
Luis M Carril

El 14/10/2011 16:46, Paolo Di Tommaso escribió:
> Hi All,
>
> I've tried to setup a cluster with 100 nodes with quite powerful machines (Hi-Mem double extra large configuration) but it ended in a total failure.
>
> The overall configuration process was extremely slow. Five instances blocked in pending state for more than 10 minutes so I had to terminate them manually .
>
> Also other machines returns some error codes, for example mounting the /home and other SGE components.
>
> I had to stop the initialization phase manually after more than 30 minutes, because it seem to hung.
>
>
> I'm not blaming about StarCluster, it is really a nice piece of software. The problem really seems to be the Amazon infrastructure that has lot of latencies and unreliable behaviors.
>
>
> What is your opinion about that? Is there anyone running successfully a "big" cluster using the StarCluster tool?
>
>
>
>
> Thank you,
>
> Paolo Di Tommaso
> Software Engineer
> Comparative Bioinformatics Group
> Centre de Regulacio Genomica (CRG)
> Dr. Aiguader, 88
> 08003 Barcelona, Spain
>
>
> _______________________________________________
> StarCluster mailing list
> StarCluster_at_mit.edu
> http://mailman.mit.edu/mailman/listinfo/starcluster

-- 
Luis M. Carril
Project Technician
Galicia Supercomputing Center (CESGA)
Avda. de Vigo s/n
15706 Santiago de Compostela
SPAIN
Tel: 34-981569810 ext 249
lmcarril_at_cesga.es
www.cesga.es
==================================================================
Received on Mon Oct 17 2011 - 05:48:28 EDT
This archive was generated by hypermail 2.3.0.

Search:

Sort all by:

Date

Month

Thread

Author

Subject