StarCluster - Mailing List Archive

removenode failing

From: Silverstein <no email>
Date: Wed, 09 Dec 2015 21:59:54 -0800

I'm running the loadbalancer on a cluster with 5 compute nodes and a
master (started with a master and 1 compute node). It correctly
detected that it should remove nodes. It removed the nodes from SGE's
execute list, but the nodes were still in the cluster (listclusters
shows them). I then killed the loadbalancer and tried removing manually
via "removenode". This resulted in:

Remove 5 nodes from cluster5(y/n)? y
>>> Running plugin elasticip.ElasticIPSetup
>>> Running plugin schrowscoreconfigurator.SchrodingerConfiguratorPlugin
>>> Running plugin starcluster.plugins.sge.SGEPlugin
>>> Removing node006 from SGE
!!! ERROR - Error occured while running plugin
'starcluster.plugins.sge.SGEPlugin':
!!! ERROR - remote command 'source /etc/profile && qconf -de node006'
!!! ERROR - failed with status 1:
!!! ERROR - denied: execution host "node006" does not exist

So I forcibly removed them. when I do that I get messages like this for
each node:

>>> Terminating node: node006 (i-d16e4815)
>>> Running plugin elasticip.ElasticIPSetup
>>> Running plugin schrowscoreconfigurator.SchrodingerConfiguratorPlugin
>>> Running plugin starcluster.plugins.sge.SGEPlugin
>>> Removing node005 from SGE
!!! ERROR - Error occured while running plugin
'starcluster.plugins.sge.SGEPlugin':

Has anyone experienced this? If so, what is causing this?

Herc
Received on Thu Dec 10 2015 - 01:00:03 EST
This archive was generated by hypermail 2.3.0.

Search:

Sort all by:

Date

Month

Thread

Author

Subject