StarCluster - Mailing List Archive

Re: [Starcluster] StarCluster timeout problem

From: Nasser Alansari <no email>
Date: Wed, 31 Mar 2010 03:59:12 +1100

Hi Justin,

thanks for the quick respond.

I've download the development version and I've notice many new
compared to version 0.91.

However, I've find 2 bugs.

If there is no volume is been specified to a cluster in the configure file,
starcluster will crash with:

 TypeError: 'NoneType' object is not iterable

After tracking the problem: the "self.VOLUMES" in ""
have a value of None. And, there are two functions(setup_ebs, setup_nfs) in

"" are tying to for-loop it.

A Quick-Fix: line: 154): I've added the following after line 154:

if not volumes:
> self.VOLUMES = []

I'm sure there is a better way to fix this.

Start a cluster and tagged as "bug2", then stop the cluster.
Then, start another cluster and tagged as "bug2", the startcluster will

Starcluster is trying to SSH ( the terminated instances from the
first cluster where the terminated instances have no hostname.

In I've add the following after line 278 to filter
out the
 terminated instances:

                if node.state == 'terminated':
> continue

Again, thanks for your great effort

On Tue, Mar 30, 2010 at 5:39 AM, Justin Riley <> wrote:

> Hash: SHA1
> Hi Nasser
> I've cc'd the starcluster mailing list, hope you don't mind.
> BTW, I'd like to invite you to join the starcluster mailing list. It's a
> good place to keep up with things and submit issues
> such as these. You can join the list here:
> Thanks for reporting this issue. I've made a quick-fix change in the
> development version of the code on github by bumping the timeout to 5
> sec. This still might not help you if the latency is really bad.
> My current thinking on this is to 'throttle' the timeout time the longer
> it takes for the cluster to appear to be up. So, at first it would
> attempt a 5 second timeout, and then incrementally raise it up to 15
> seconds as necessary. After a maximum of 15 seconds and enough retries,
> it would likely just error out.
> This is on my list for the next version.
> Thanks for reporting!
> ~Justin
> > Problem:
> > I've installed & configured StartCluster correctly. However, when I
> try to start it with "startcluster -s", everything goes fine until it
> reach the line ">>> Waiting for cluster to start..." and that when it
> run forever(infinite loop). Even after all the instances are in
> "running" state.
> >
> > Solution:
> > After debugging, I found out that the value of socket's timeout(0.25) in:
> >
> > File: starcluster/
> > Function: is_ssh_up()
> > Line: s.settimeout(0.25)
> >
> > is too small for my connection; due to a latency issue.
> >
> > So I've commented, as a quick fix, that line and everything work fine.
> >
> > A bigger value would solve this.
> >
> > Thanks for your great work and keep it up
> > Nasser
> Version: GnuPG v2.0.14 (GNU/Linux)
> Comment: Using GnuPG with Mozilla -
> iEYEARECAAYFAkuw8/YACgkQ4llAkMfDcrmlBwCePfX/zZoQjqlh9dQS7xo4geQm
> wn4AoJHE0/AdvRbAMB4EIz5yvompZsRt
> =kjHp
Received on Tue Mar 30 2010 - 12:59:13 EDT
This archive was generated by hypermail 2.3.0.


Sort all by: