StarCluster - Mailing List Archive

Re: starcluster starts but not all nodes added as exec nodes

From: Justin Riley <no email>
Date: Wed, 16 Mar 2011 11:57:51 -0400

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi Jeff/Joseph,

I just requested to up my EC2 instance limit so that I can test things
out at this scale and see what the issue is. In the mean time would you
mind sending me any logs found in /opt/sge6/default/common/install_logs
and also the /opt/sge6/ec2_sge.conf for a failed run?

Also if this happens again you could try reinstalling SGE manually
assuming all the nodes are up:

$ starcluster sshmaster mycluster
$ cd /opt/sge6
$ ./inst_sge -m -x -auto ./ec2_sge.conf

~Justin

On 03/15/2011 06:30 PM, Kyeong Soo (Joseph) Kim wrote:
> Hi Jeff,
>
> I experienced the same thing with my 50-node configuration (c1.xlarge).
> Out of 50 nodes, only 29 nodes are successfully identified by the SGE.
>
> Regards,
> Joseph
>
> On Sat, Mar 5, 2011 at 10:15 PM, Jeff White <jeff_at_decide.com> wrote:
>> I can frequently reproduce an issue where 'starcluster start' completes
>> without error, but not all nodes are added to the SGE pool, which I verify
>> by running 'qconf -sel' on the master. The latest example I have is creating
>> a 25-node cluster, where only the first 12 nodes are successfully installed.
>> The remaining instances are running and I can ssh to them but they aren't
>> running sge_execd. There are only install log files for the first 12 nodes
>> in /opt/sge6/default/common/install_logs. I have not found any clues in the
>> starcluster debug log or the logs inside master:/opt/sge6/.
>>
>> I am running starcluster development snapshot 8ef48a3 downloaded on
>> 2011-02-15, with the following relevant settings:
>>
>> NODE_IMAGE_ID=ami-8cf913e5
>> NODE_INSTANCE_TYPE = m1.small
>>
>> I have seen this behavior with the latest 32-bit and 64-bit starcluster
>> AMIs. Our workaround is to start a small cluster and progressively add nodes
>> one at a time, which is time-consuming.
>>
>> Has anyone else noticed this and have a better workaround or an idea for a
>> fix?
>>
>> jeff
>>
>>
>> _______________________________________________
>> StarCluster mailing list
>> StarCluster_at_mit.edu
>> http://mailman.mit.edu/mailman/listinfo/starcluster
>>
>>
> _______________________________________________
> StarCluster mailing list
> StarCluster_at_mit.edu
> http://mailman.mit.edu/mailman/listinfo/starcluster

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk2A3f8ACgkQ4llAkMfDcrkM7QCfSL00SabBDtA4DCq9jsZikgKB
3i8AniOTUgBhPglk76o2h0POJMwepXvw
=O/Nn
-----END PGP SIGNATURE-----
Received on Wed Mar 16 2011 - 11:58:16 EDT
This archive was generated by hypermail 2.3.0.

Search:

Sort all by:

Date

Month

Thread

Author

Subject