StarCluster - Mailing List Archive

starcluster starts but not all nodes added as exec nodes

From: Jeff White <no email>
Date: Sat, 5 Mar 2011 14:15:41 -0800

I can frequently reproduce an issue where 'starcluster start' completes
without error, but not all nodes are added to the SGE pool, which I verify
by running 'qconf -sel' on the master. The latest example I have is creating
a 25-node cluster, where only the first 12 nodes are successfully installed.
The remaining instances are running and I can ssh to them but they aren't
running sge_execd. There are only install log files for the first 12 nodes
in /opt/sge6/default/common/install_logs. I have not found any clues in the
starcluster debug log or the logs inside master:/opt/sge6/.

I am running starcluster development snapshot 8ef48a3 downloaded on
2011-02-15, with the following relevant settings:

NODE_IMAGE_ID=ami-8cf913e5
NODE_INSTANCE_TYPE = m1.small

I have seen this behavior with the latest 32-bit and 64-bit starcluster
AMIs. Our workaround is to start a small cluster and progressively add nodes
one at a time, which is time-consuming.

Has anyone else noticed this and have a better workaround or an idea for a
fix?

jeff
Received on Sat Mar 05 2011 - 17:15:42 EST
This archive was generated by hypermail 2.3.0.

Search:

Sort all by:

Date

Month

Thread

Author

Subject