StarCluster - Mailing List Archive

Re: Large cluster (125 nodes) launch failure

From: Joseph <Kyeong>
Date: Wed, 16 Mar 2011 16:36:07 +0000

Justin,

Again, many thanks for your valuable suggestions!
I will try those next time I configure large clusters; now, I
terminated it all, started from scratch, and am running five 25-node
clusters (still the same issues even with this configuration though).

By the way, our postings are crossed each other through two different
threads (difficulty of multi-threading discussions?).
Please, check the other thread where I just responded to your post
with the requested log files.

Regards,
Joseph


On Wed, Mar 16, 2011 at 4:24 PM, Justin Riley <jtriley_at_mit.edu> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 03/16/2011 12:18 PM, Justin Riley wrote:
>> 1. wait for your jobs to finish and manually terminate the idle nodes to
>> stop paying for them in the mean time (tedious)
>
> You might also try on the idle nodes:
>
> $ cd /opt/sge6
> $ ./inst_sge -x -auto ./ec2_sge.conf
>
> I don't *think* this will affect any currently running jobs but I'm not
> 100% so if you're concerned I wouldn't recommend trying this.
>
> ~Justin
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v2.0.17 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
>
> iEYEARECAAYFAk2A5CoACgkQ4llAkMfDcrmuhgCgkmTpoLKVZgwHJpOYUpMzi1dB
> qzUAnij2B/2ooh+kbDqU5bQTuJA2K44U
> =rmXZ
> -----END PGP SIGNATURE-----
> _______________________________________________
> StarCluster mailing list
> StarCluster_at_mit.edu
> http://mailman.mit.edu/mailman/listinfo/starcluster
>
Received on Wed Mar 16 2011 - 12:36:08 EDT
This archive was generated by hypermail 2.3.0.

Search:

Sort all by:

Date

Month

Thread

Author

Subject