StarCluster - Mailing List Archive

Error removing node from SGE

From: David Erickson <no email>
Date: Sat, 02 Feb 2013 10:56:32 -0800

Hi All-
I am seeing the following in my logs when running loadbalancer and
removing nodes:

*** WARNING - Removing node013: i-dc5dd8ac
(ec2-54-234-124-79.compute-1.amazonaws.com)
>>> Running plugin dnrc-cplex
>>> Removing node node013 (i-dc5dd8ac)...
>>> Removing node013 from known_hosts files
>>> Removing node013 from /etc/hosts
>>> Removing node013 from NFS
>>> Removing node013 from SGE
!!! ERROR - command 'source /etc/profile && qconf -dconf node013' failed
with status 1
>>> Updating SGE parallel environment 'orte'
19/19 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
100%
>>> Adding parallel environment 'orte' to queue 'all.q'
>>> Removing node node013 (i-dc5dd8ac)...
>>> Removing node013 from known_hosts files
>>> Removing node013 from /etc/hosts
>>> Removing node013 from NFS
>>> Canceling spot request sir-d69dda14
>>> Terminating node: node013 (i-dc5dd8ac)

It eventually removes the node, but that qconf -dconf command is always
failing with status 1.

Thanks,
David
Received on Sat Feb 02 2013 - 13:58:24 EST
This archive was generated by hypermail 2.3.0.

Search:

Sort all by:

Date

Month

Thread

Author

Subject