I'm running the loadbalancer on a cluster with 5 compute nodes and a
master (started with a master and 1 compute node). It correctly
detected that it should remove nodes. It removed the nodes from SGE's
execute list, but the nodes were still in the cluster (listclusters
shows them). I then killed the loadbalancer and tried removing manually
via "removenode". This resulted in:
Remove 5 nodes from cluster5(y/n)? y
>>> Running plugin elasticip.ElasticIPSetup
>>> Running plugin schrowscoreconfigurator.SchrodingerConfiguratorPlugin
>>> Running plugin starcluster.plugins.sge.SGEPlugin
>>> Removing node006 from SGE
!!! ERROR - Error occured while running plugin
'starcluster.plugins.sge.SGEPlugin':
!!! ERROR - remote command 'source /etc/profile && qconf -de node006'
!!! ERROR - failed with status 1:
!!! ERROR - denied: execution host "node006" does not exist
So I forcibly removed them. when I do that I get messages like this for
each node:
>>> Terminating node: node006 (i-d16e4815)
>>> Running plugin elasticip.ElasticIPSetup
>>> Running plugin schrowscoreconfigurator.SchrodingerConfiguratorPlugin
>>> Running plugin starcluster.plugins.sge.SGEPlugin
>>> Removing node005 from SGE
!!! ERROR - Error occured while running plugin
'starcluster.plugins.sge.SGEPlugin':
Has anyone experienced this? If so, what is causing this?
Herc
Received on Thu Dec 10 2015 - 01:00:03 EST
This archive was generated by
hypermail 2.3.0.