StarCluster - Mailing List Archive

Re: failed to add/remove additional nodes

From: Ryan Golhar <no email>
Date: Thu, 3 Apr 2014 14:47:58 -0400

Nevermind. I ran out of space on my root partition due to an error in one
of my scripts.


On Thu, Apr 3, 2014 at 2:32 PM, Ryan Golhar <ngsbioinformatics_at_gmail.com>wrote:

> Hi all - I have a 50 node spot cluster running. I tried to add 10
> additional nodes and at some point along the way it failed. Only 2 nodes
> were added to the cluster, but they aren't getting SGE jobs. I tried
> re-adding the nodes using '-x -a' but it fails. So I then tried to remove
> the nodes, and that is failing as well. How do I fix this? Here's the
> output:
>
> [ec2-user_at_awsmicro plugins]$ starcluster removenode ngscluster node060
>
> StarCluster - (http://star.mit.edu/cluster) (v. 0.9999)
>
> Software Tools for Academics and Researchers (STAR)
>
> Please submit bug reports to starcluster_at_mit.edu
>
>
> >>> Running plugin tagger.TaggerPlugin
>
> >>> Running plugin setupuserenv.SetupUserEnvironment
>
> >>> Running plugin starcluster.plugins.users.CreateUsers
>
> >>> Running plugin starcluster.plugins.sge.SGEPlugin
>
> >>> Removing node060 from SGE
>
> !!! ERROR - Error occured while running plugin
> 'starcluster.plugins.sge.SGEPlugin':
>
> !!! ERROR - remote command 'source /etc/profile && qconf -dattr
>
> !!! ERROR - hostgroup hostlist node060 _at_allhosts' failed with status 1:
>
> !!! ERROR - error writing object "_at_allhosts" to spooling database
>
>
> At this point, I have to go into the AWS web console and remove the nodes
> myself as starcluster isn't able to.
>
>
Received on Thu Apr 03 2014 - 14:48:01 EDT
This archive was generated by hypermail 2.3.0.

Search:

Sort all by:

Date

Month

Thread

Author

Subject