StarCluster - Mailing List Archive

[Star Cluster] NoneType user errors when removing nodes

From: Jin Yu <no email>
Date: Sun, 20 Jul 2014 15:08:22 -0500

Hello,

I encountered continuing errors when trying to remove nodes using
loadbalancer.

>From error messages (which is appended below), I got an error regarding the
user object.I am just using the default user "sgeadmin".

I log in the "tried to remove" node and can verify that following steps
have been done:

1. the node has been removed from SGE
2. NFS has been unmounted
3. sgeadmin user has been deleted
4. the hosts file has no ip of any other nodes or masters instance

But this node is not terminated and still show up when I "starcluster lc".


Thanks!
Jin



>>> Running plugin starcluster.plugins.sge.SGEPlugin
>>> Removing node037 from SGE
>>> Updating SGE parallel environment 'orte'
50/50 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
100%
>>> Adding parallel environment 'orte' to queue 'all.q'
>>> Running plugin starcluster.clustersetup.DefaultClusterSetup
>>> Removing node node037 (i-1f013c34)...
>>> Removing node037 from known_hosts files
!!! ERROR - Error occured while running plugin
'starcluster.clustersetup.DefaultClusterSetup':
!!! ERROR - Failed to remove node node037
Traceback (most recent call last):
  File
"/usr/local/lib/python2.7/dist-packages/StarCluster-0.95.5-py2.7.egg/starcluster/balancers/sge/__init__.py",
line 754, in _eval_remove_node
    self._cluster.remove_node(node)
  File
"/usr/local/lib/python2.7/dist-packages/StarCluster-0.95.5-py2.7.egg/starcluster/cluster.py",
line 1050, in remove_node
    force=force)
  File
"/usr/local/lib/python2.7/dist-packages/StarCluster-0.95.5-py2.7.egg/starcluster/cluster.py",
line 1076, in remove_nodes
    reverse=True)
  File
"/usr/local/lib/python2.7/dist-packages/StarCluster-0.95.5-py2.7.egg/starcluster/cluster.py",
line 1690, in run_plugins
    self.run_plugin(plug, method_name=method_name, node=node)
  File
"/usr/local/lib/python2.7/dist-packages/StarCluster-0.95.5-py2.7.egg/starcluster/cluster.py",
line 1715, in run_plugin
    func(*args)
  File
"/usr/local/lib/python2.7/dist-packages/StarCluster-0.95.5-py2.7.egg/starcluster/clustersetup.py",
line 407, in on_remove_node
    self._remove_from_known_hosts(node)
  File
"/usr/local/lib/python2.7/dist-packages/StarCluster-0.95.5-py2.7.egg/starcluster/clustersetup.py",
line 397, in _remove_from_known_hosts
    n.remove_from_known_hosts(self._user, [node])
  File
"/usr/local/lib/python2.7/dist-packages/StarCluster-0.95.5-py2.7.egg/starcluster/node.py",
line 588, in remove_from_known_hosts
    known_hosts_file = posixpath.join(user.pw_dir, '.ssh', 'known_hosts')
AttributeError: 'NoneType' object has no attribute 'pw_dir'
>>> Sleeping...(looping again in 60 secs)
Received on Sun Jul 20 2014 - 16:08:23 EDT
This archive was generated by hypermail 2.3.0.

Search:

Sort all by:

Date

Month

Thread

Author

Subject