StarCluster - Mailing List Archive

Re: starcluster starts but not all nodes added as exec nodes

From: Justin Riley <no email>
Date: Wed, 16 Mar 2011 11:57:51 -0400

Hash: SHA1

Hi Jeff/Joseph,

I just requested to up my EC2 instance limit so that I can test things
out at this scale and see what the issue is. In the mean time would you
mind sending me any logs found in /opt/sge6/default/common/install_logs
and also the /opt/sge6/ec2_sge.conf for a failed run?

Also if this happens again you could try reinstalling SGE manually
assuming all the nodes are up:

$ starcluster sshmaster mycluster
$ cd /opt/sge6
$ ./inst_sge -m -x -auto ./ec2_sge.conf


On 03/15/2011 06:30 PM, Kyeong Soo (Joseph) Kim wrote:
> Hi Jeff,
> I experienced the same thing with my 50-node configuration (c1.xlarge).
> Out of 50 nodes, only 29 nodes are successfully identified by the SGE.
> Regards,
> Joseph
> On Sat, Mar 5, 2011 at 10:15 PM, Jeff White <> wrote:
>> I can frequently reproduce an issue where 'starcluster start' completes
>> without error, but not all nodes are added to the SGE pool, which I verify
>> by running 'qconf -sel' on the master. The latest example I have is creating
>> a 25-node cluster, where only the first 12 nodes are successfully installed.
>> The remaining instances are running and I can ssh to them but they aren't
>> running sge_execd. There are only install log files for the first 12 nodes
>> in /opt/sge6/default/common/install_logs. I have not found any clues in the
>> starcluster debug log or the logs inside master:/opt/sge6/.
>> I am running starcluster development snapshot 8ef48a3 downloaded on
>> 2011-02-15, with the following relevant settings:
>> NODE_IMAGE_ID=ami-8cf913e5
>> NODE_INSTANCE_TYPE = m1.small
>> I have seen this behavior with the latest 32-bit and 64-bit starcluster
>> AMIs. Our workaround is to start a small cluster and progressively add nodes
>> one at a time, which is time-consuming.
>> Has anyone else noticed this and have a better workaround or an idea for a
>> fix?
>> jeff
>> _______________________________________________
>> StarCluster mailing list
> _______________________________________________
> StarCluster mailing list

Version: GnuPG v2.0.17 (GNU/Linux)
Comment: Using GnuPG with Mozilla -

Received on Wed Mar 16 2011 - 11:58:16 EDT
This archive was generated by hypermail 2.3.0.


Sort all by: