Re: starcluster starts but not all nodes added as exec nodes
This archive was generated by
I experienced the same thing with my 50-node configuration (c1.xlarge).
Out of 50 nodes, only 29 nodes are successfully identified by the SGE.
On Sat, Mar 5, 2011 at 10:15 PM, Jeff White <jeff_at_decide.com> wrote:
> I can frequently reproduce an issue where 'starcluster start' completes
> without error, but not all nodes are added to the SGE pool, which I verify
> by running 'qconf -sel' on the master. The latest example I have is creating
> a 25-node cluster, where only the first 12 nodes are successfully installed.
> The remaining instances are running and I can ssh to them but they aren't
> running sge_execd. There are only install log files for the first 12 nodes
> in /opt/sge6/default/common/install_logs. I have not found any clues in the
> starcluster debug log or the logs inside master:/opt/sge6/.
> I am running starcluster development snapshot 8ef48a3 downloaded on
> 2011-02-15, with the following relevant settings:
> NODE_INSTANCE_TYPE = m1.small
> I have seen this behavior with the latest 32-bit and 64-bit starcluster
> AMIs. Our workaround is to start a small cluster and progressively add nodes
> one at a time, which is time-consuming.
> Has anyone else noticed this and have a better workaround or an idea for a
> StarCluster mailing list
Received on Tue Mar 15 2011 - 18:30:28 EDT