StarCluster - Mailing List Archive

Re: force starcluster run

From: Justin Riley <no email>
Date: Sun, 05 Dec 2010 20:58:06 -0500

Hi Adam,

> StarCluster rocks! Great job Justin et al.

Thanks a lot, glad you like it :D

> I was using starcluster (v. 0.9999) to start an 80 node spot instance cluster recently and run into an issue.
>
> starcluster start -b 0.10 -s 80 SpotCluster

Wow, OK, I haven't tried with that many nodes before but it should work.
Please be patient with the setup, I'd imagine this will take some time
given the size of the cluster.

> It took a few minutes for the spots to open and the instances to be running. StarCluster was still waiting on instances to come up so I ran the start command with --no-create
>
> starcluster start --no-create -s 80 SpotCluster
> starcluster start --no-create SpotCluster
>
> I can verify with the AWS console and the output 'starcluster listclusters' that all 80 instances are up and running. Is there a way to force starcluster to run the install? Is starcluster checking something other than ec2-describe-instances like ssh to see if a node is up?
>
> Not sure if this is due to the cluster size, spot instances, or just an anomaly like one node not starting sshd.

StarCluster checks that there are CLUSTER_SIZE nodes in a 'running'
state and whether ssh is up on all the 'running' nodes in the cluster
when it is 'Waiting for cluster to start'. This is the reason why
StarCluster is still waiting even though you see all instances 'running'
in ec2-describe-instances; ssh is likely not up yet for *all* instances
even though they're all in a 'running' state. There really can't be a
'force install' because StarCluster has to be able to connect to all
nodes in the cluster via ssh before it can do anything with the instances.

With that said I'd also expect this process of checking ssh on all the
nodes to take some time so if you're not patient you may not end up
giving StarCluster enough time to make connections to all 80 nodes. How
long did you wait for StarCluster before canceling the run?

Also, you mentioned you're using version 0.9999. When did you last
pull/install the changes from github?

> I'll try again today I just wanted to see if I could buy a clue from the list.

OK great. If you don't mind, please report whether you're successful or
not. I'm very interested to know...

~Justin
Received on Sun Dec 05 2010 - 20:58:04 EST
This archive was generated by hypermail 2.3.0.

Search:

Sort all by:

Date

Month

Thread

Author

Subject