StarCluster - Mailing List Archive

Re: Crash report

From: Daniel Povey <no email>
Date: Wed, 6 Feb 2013 00:17:06 -0500

Also, somehow this cluster got into a weird state, with two copies of

Cluster nodes:
     master running i-b0a5cec0
    node001 running i-5c3e542c
    node001 running i-063e5476
    node002 running i-5a32582a
    node003 running i-5c32582c
    node004 running i-da741eaa
    node005 running i-dc741eac
    node006 running i-a06515d0
    node007 running i-a26515d2
    node008 running i-c4493ab4
    node009 running i-c6493ab6
    node010 running i-c8493ab8
Total nodes: 12

Also some nodes (e.g. 002, 003, 004) were not listed in _at_allhosts in the
Possibly this is because I was running the load balancer? It didn't seem
to be working quite right; it wasn't really removing nodes.


On Wed, Feb 6, 2013 at 12:14 AM, Daniel Povey <> wrote:

> BTW, I manually removed it from the queue using qconf -mhgrp _at_allhosts
> before I called the rn command (because I wanted to make sure no jobs were
> running on the nodes I was removing and I wasn't sure whether the rn
> command would wait). Not sure if this would cause the crash.
> Dan
Received on Wed Feb 06 2013 - 00:17:07 EST
This archive was generated by hypermail 2.3.0.


Sort all by: