Re: starcluster starts but not all nodes added as exec nodes
Hi Jeff,
I experienced the same thing with my 50-node configuration (c1.xlarge).
Out of 50 nodes, only 29 nodes are successfully identified by the SGE.
Regards,
Joseph
On Sat, Mar 5, 2011 at 10:15 PM, Jeff White <jeff_at_decide.com> wrote:
> I can frequently reproduce an issue where 'starcluster start' completes
> without error, but not all nodes are added to the SGE pool, which I verify
> by running 'qconf -sel' on the master. The latest example I have is creating
> a 25-node cluster, where only the first 12 nodes are successfully installed.
> The remaining instances are running and I can ssh to them but they aren't
> running sge_execd. There are only install log files for the first 12 nodes
> in /opt/sge6/default/common/install_logs. I have not found any clues in the
> starcluster debug log or the logs inside master:/opt/sge6/.
>
> I am running starcluster development snapshot 8ef48a3 downloaded on
> 2011-02-15, with the following relevant settings:
>
> NODE_IMAGE_ID=ami-8cf913e5
> NODE_INSTANCE_TYPE = m1.small
>
> I have seen this behavior with the latest 32-bit and 64-bit starcluster
> AMIs. Our workaround is to start a small cluster and progressively add nodes
> one at a time, which is time-consuming.
>
> Has anyone else noticed this and have a better workaround or an idea for a
> fix?
>
> jeff
>
>
> _______________________________________________
> StarCluster mailing list
> StarCluster_at_mit.edu
> http://mailman.mit.edu/mailman/listinfo/starcluster
>
>
Received on Tue Mar 15 2011 - 18:30:28 EDT
This archive was generated by
hypermail 2.3.0.