StarCluster - Mailing List Archive

Re: Starcluster stuck during setup

From: Cory Dolphin <no email>
Date: Tue, 25 Mar 2014 20:11:07 -0400

Whenever I try and add a node to a spot instance cluster, starcluster does
not properly wait for the spot request to be fulfilled, and instead errors
out:

starcluster addnode mycluster
StarCluster - (http://star.mit.edu/cluster) (v. 0.95.3)
Software Tools for Academics and Researchers (STAR)
Please submit bug reports to starcluster_at_mit.edu

>>> Launching node(s): node030
SpotInstanceRequest:sir-85f44249
>>> Waiting for spot requests to propagate...
>>> Waiting for node(s) to come up... (updating every 30s)
>>> Waiting for all nodes to be in a 'running' state...
30/30 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
100%
>>> Waiting for SSH to come up on all nodes...
30/30 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
100%
>>> Waiting for cluster to come up took 1.179 mins
!!! ERROR - node 'node030' does not exist


Once the spot instance request is fulfilled, the instance does not have a
name. Looks like someone else had this problem quite
recently<http://star.mit.edu/cluster/mlarchives/2058.html>.
I wonder what the difference between our setup and yours is?


On Tue, Mar 25, 2014 at 7:42 PM, Rayson Ho <raysonlogin_at_gmail.com> wrote:

> If you really have a slow connection, you may consider bootstrapping
> StarCluster on AWS - ie. configure an m1.small (or even t1.micro) and
> install StarCluster on that node. In fact, there's a CloudFormation
> template for that:
>
> http://aws.typepad.com/aws/2012/06/ec2-spot-instance-updates-auto-scaling-and-cloudformation-integration-new-sample-app-1.html
> . On the other hand, it's way easier to do it by hand and just launch
> an instance from the standard Ubuntu AMI, and then install StarCluster
> on that instance.
>
> And like others mentioned, most large StarClusters are launched by
> first starting a small cluster, and then grow it dynamically. You
> should be able to run the addnode command from your qmaster node
> provided that you have StarCluster setup there (note that your AWS key
> will be on the EC2 instance so it is slightly more risky if security
> is the main concern).
>
> Rayson
>
> ==================================================
> Open Grid Scheduler - The Official Open Source Grid Engine
> http://gridscheduler.sourceforge.net/
> http://gridscheduler.sourceforge.net/GridEngine/GridEngineCloud.html
>
>
> On Tue, Mar 25, 2014 at 8:04 AM, Butson, Christopher <cbutson_at_mcw.edu>
> wrote:
> > Interesting: I let it go and it eventually continued but it took over an
> hour to Configuring passwordless ssh for root. Still waiting for the
> cluster to finish startup...
> >
> > Christopher R. Butson, Ph.D.
> > Associate Professor
> > Biotechnology & Bioengineering Center
> > Departments of Neurology, Neurosurgery, Psychiatry & Behavioral Medicine
> > Medical College of Wisconsin
> > (414) 955-2678
> > cbutson_at_mcw.edu<mailto:cbutson_at_mcw.edu>
> >
> >
> > From: <Butson>, Christopher Butson <cbutson_at_mcw.edu<mailto:
> cbutson_at_mcw.edu>>
> > Date: Tuesday, March 25, 2014 12:13 PM
> > To: "starcluster_at_mit.edu<mailto:starcluster_at_mit.edu>" <
> starcluster_at_mit.edu<mailto:starcluster_at_mit.edu>>
> > Subject: Starcluster stuck during setup
> >
> > I'm on a slow internet connection overseas, trying to initiate a cluster
> using StarCluster. Once I type "starcluster start mycluster" everything
> seems to go ok but it gets stuck at the following point and never seems to
> get past it:
> >>>> Mounting all NFS export path(s) on 79 worker node(s)
> > 79/79 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
> 100%
> >>>> Setting up NFS took 2.777 mins
> >>>> Configuring passwordless ssh for root
> >
> > Any idea why this might occur? Thanks,
> > Chris
> >
> > Christopher R. Butson, Ph.D.
> > Associate Professor
> > Biotechnology & Bioengineering Center
> > Departments of Neurology, Neurosurgery, Psychiatry & Behavioral Medicine
> > Medical College of Wisconsin
> > (414) 955-2678
> > cbutson_at_mcw.edu<mailto:cbutson_at_mcw.edu>
> >
> >
> > _______________________________________________
> > StarCluster mailing list
> > StarCluster_at_mit.edu
> > http://mailman.mit.edu/mailman/listinfo/starcluster
> _______________________________________________
> StarCluster mailing list
> StarCluster_at_mit.edu
> http://mailman.mit.edu/mailman/listinfo/starcluster
>
Received on Tue Mar 25 2014 - 20:11:11 EDT
This archive was generated by hypermail 2.3.0.

Search:

Sort all by:

Date

Month

Thread

Author

Subject