StarCluster - Mailing List Archive

Re: starcluster starts but not all nodes added as exec nodes

From: Joseph <Kyeong>
Date: Tue, 15 Mar 2011 22:30:27 +0000

Hi Jeff,

I experienced the same thing with my 50-node configuration (c1.xlarge).
Out of 50 nodes, only 29 nodes are successfully identified by the SGE.

Regards,
Joseph

On Sat, Mar 5, 2011 at 10:15 PM, Jeff White <jeff_at_decide.com> wrote:
> I can frequently reproduce an issue where 'starcluster start' completes
> without error, but not all nodes are added to the SGE pool, which I verify
> by running 'qconf -sel' on the master. The latest example I have is creating
> a 25-node cluster, where only the first 12 nodes are successfully installed.
> The remaining instances are running and I can ssh to them but they aren't
> running sge_execd. There are only install log files for the first 12 nodes
> in /opt/sge6/default/common/install_logs. I have not found any clues in the
> starcluster debug log or the logs inside master:/opt/sge6/.
>
> I am running starcluster development snapshot 8ef48a3 downloaded on
> 2011-02-15, with the following relevant settings:
>
> NODE_IMAGE_ID=ami-8cf913e5
> NODE_INSTANCE_TYPE = m1.small
>
> I have seen this behavior with the latest 32-bit and 64-bit starcluster
> AMIs. Our workaround is to start a small cluster and progressively add nodes
> one at a time, which is time-consuming.
>
> Has anyone else noticed this and have a better workaround or an idea for a
> fix?
>
> jeff
>
>
> _______________________________________________
> StarCluster mailing list
> StarCluster_at_mit.edu
> http://mailman.mit.edu/mailman/listinfo/starcluster
>
>
Received on Tue Mar 15 2011 - 18:30:28 EDT
This archive was generated by hypermail 2.3.0.

Search:

Sort all by:

Date

Month

Thread

Author

Subject