Re: Large cluster (125 nodes) launch failure
This archive was generated by
Again, many thanks for your valuable suggestions!
I will try those next time I configure large clusters; now, I
terminated it all, started from scratch, and am running five 25-node
clusters (still the same issues even with this configuration though).
By the way, our postings are crossed each other through two different
threads (difficulty of multi-threading discussions?).
Please, check the other thread where I just responded to your post
with the requested log files.
On Wed, Mar 16, 2011 at 4:24 PM, Justin Riley <jtriley_at_mit.edu> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> On 03/16/2011 12:18 PM, Justin Riley wrote:
>> 1. wait for your jobs to finish and manually terminate the idle nodes to
>> stop paying for them in the mean time (tedious)
> You might also try on the idle nodes:
> $ cd /opt/sge6
> $ ./inst_sge -x -auto ./ec2_sge.conf
> I don't *think* this will affect any currently running jobs but I'm not
> 100% so if you're concerned I wouldn't recommend trying this.
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v2.0.17 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
> -----END PGP SIGNATURE-----
> StarCluster mailing list
Received on Wed Mar 16 2011 - 12:36:08 EDT