StarCluster - Mailing List Archive

Re: trouble with starting a large cluster

From: Rayson Ho <no email>
Date: Thu, 1 Sep 2011 14:11:22 -0700 (PDT)

In 0.92rc2, there's the addnode command, which would allow you to start from a small number of nodes and then grow the cluster.

"Adding and Removing Nodes from StarCluster"

http://web.mit.edu/stardev/cluster/docs/0.92rc2/manual/addremovenode.html

Rayson

=================================
Grid Engine / Open Grid Scheduler
http://gridscheduler.sourceforge.net


--- On Thu, 9/1/11, Rayson Ho <raysonlogin_at_yahoo.com> wrote:
> > #cli.py:1079 - ERROR - failed to connect to host
> > ec2-50-19-64-123.compute-1.amazonaws.com on port 22
> >
> > Looking at the AWS console, I could see all 30
> instances
> > were up and running. I even checked a few boot logs
> (e.g.
> > right click on an instance and choose the "Get System
> Log"
> > menu item), which all looked OK to me, granted I
> didn't
> > check all 30 logs...,
>
> Can you check if "ec2-50-19-64-123" is stuck??
>
> I believe once in a while, a VM on EC2 fails to startup...
> But rebooting the machine would work-around the issue. (May
> be hardware related or a bug in the EC2 provisioning
> layer.)
>
> http://mailman.mit.edu/pipermail/starcluster/2011-April/000703.html
>
> Rayson
>
> =================================
> Grid Engine / Open Grid Scheduler
> http://gridscheduler.sourceforge.net
>
> > maybe there is one instance having
> > trouble starting, like the above message suggesting...
> I'm
> > guessing this could be simply a timing-out issue but I
> don't
> > know if/where there's a place I can change this. Dose
> > StarCluster skip any instances that fail to come up?
> >
> > And I'm using 0.91.2. I was hoping not to have to
> upgrade
> > (yet) as I'm needing results fast and don't want to
> risk
> > breaking something during the upgrade. AWS gave me
> capacity
> > to run 400 instances, so I'm hoping this is an easily
> solved
> > problem and I would be able to use that capacity...
> >
> > Appreciate any help!
> >
> > fei
> >
> > _______________________________________________
> > StarCluster mailing list
> > StarCluster_at_mit.edu
> > http://mailman.mit.edu/mailman/listinfo/starcluster
> >
> _______________________________________________
> StarCluster mailing list
> StarCluster_at_mit.edu
> http://mailman.mit.edu/mailman/listinfo/starcluster
>
Received on Thu Sep 01 2011 - 17:11:25 EDT
This archive was generated by hypermail 2.3.0.

Search:

Sort all by:

Date

Month

Thread

Author

Subject