StarCluster - Mailing List Archive

Re: 100 nodes cluster

From: Luis M. Carril <no email>
Date: Mon, 17 Oct 2011 11:48:23 +0200

     Although I´ve never tested a deployment so big, I´ve had a lot of
problems with 10-20 node deployments. Always one machine or two hangs
booting or deploying, which is pretty annoying; so I can´t have the
cluster deployment completely automatized because I have to watch it to
stop or boot the failing nodes.

Best regards
Luis M Carril

El 14/10/2011 16:46, Paolo Di Tommaso escribió:
> Hi All,
> I've tried to setup a cluster with 100 nodes with quite powerful machines (Hi-Mem double extra large configuration) but it ended in a total failure.
> The overall configuration process was extremely slow. Five instances blocked in pending state for more than 10 minutes so I had to terminate them manually .
> Also other machines returns some error codes, for example mounting the /home and other SGE components.
> I had to stop the initialization phase manually after more than 30 minutes, because it seem to hung.
> I'm not blaming about StarCluster, it is really a nice piece of software. The problem really seems to be the Amazon infrastructure that has lot of latencies and unreliable behaviors.
> What is your opinion about that? Is there anyone running successfully a "big" cluster using the StarCluster tool?
> Thank you,
> Paolo Di Tommaso
> Software Engineer
> Comparative Bioinformatics Group
> Centre de Regulacio Genomica (CRG)
> Dr. Aiguader, 88
> 08003 Barcelona, Spain
> _______________________________________________
> StarCluster mailing list

Luis M. Carril
Project Technician
Galicia Supercomputing Center (CESGA)
Avda. de Vigo s/n
15706 Santiago de Compostela
Tel: 34-981569810 ext 249
Received on Mon Oct 17 2011 - 05:48:28 EDT
This archive was generated by hypermail 2.3.0.


Sort all by: