[Star Cluster] NoneType user errors when removing nodes
Hello,
I encountered continuing errors when trying to remove nodes using
loadbalancer.
>From error messages (which is appended below), I got an error regarding the
user object.I am just using the default user "sgeadmin".
I log in the "tried to remove" node and can verify that following steps
have been done:
1. the node has been removed from SGE
2. NFS has been unmounted
3. sgeadmin user has been deleted
4. the hosts file has no ip of any other nodes or masters instance
But this node is not terminated and still show up when I "starcluster lc".
Thanks!
Jin
>>> Running plugin starcluster.plugins.sge.SGEPlugin
>>> Removing node037 from SGE
>>> Updating SGE parallel environment 'orte'
50/50 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
100%
>>> Adding parallel environment 'orte' to queue 'all.q'
>>> Running plugin starcluster.clustersetup.DefaultClusterSetup
>>> Removing node node037 (i-1f013c34)...
>>> Removing node037 from known_hosts files
!!! ERROR - Error occured while running plugin
'starcluster.clustersetup.DefaultClusterSetup':
!!! ERROR - Failed to remove node node037
Traceback (most recent call last):
File
"/usr/local/lib/python2.7/dist-packages/StarCluster-0.95.5-py2.7.egg/starcluster/balancers/sge/__init__.py",
line 754, in _eval_remove_node
self._cluster.remove_node(node)
File
"/usr/local/lib/python2.7/dist-packages/StarCluster-0.95.5-py2.7.egg/starcluster/cluster.py",
line 1050, in remove_node
force=force)
File
"/usr/local/lib/python2.7/dist-packages/StarCluster-0.95.5-py2.7.egg/starcluster/cluster.py",
line 1076, in remove_nodes
reverse=True)
File
"/usr/local/lib/python2.7/dist-packages/StarCluster-0.95.5-py2.7.egg/starcluster/cluster.py",
line 1690, in run_plugins
self.run_plugin(plug, method_name=method_name, node=node)
File
"/usr/local/lib/python2.7/dist-packages/StarCluster-0.95.5-py2.7.egg/starcluster/cluster.py",
line 1715, in run_plugin
func(*args)
File
"/usr/local/lib/python2.7/dist-packages/StarCluster-0.95.5-py2.7.egg/starcluster/clustersetup.py",
line 407, in on_remove_node
self._remove_from_known_hosts(node)
File
"/usr/local/lib/python2.7/dist-packages/StarCluster-0.95.5-py2.7.egg/starcluster/clustersetup.py",
line 397, in _remove_from_known_hosts
n.remove_from_known_hosts(self._user, [node])
File
"/usr/local/lib/python2.7/dist-packages/StarCluster-0.95.5-py2.7.egg/starcluster/node.py",
line 588, in remove_from_known_hosts
known_hosts_file = posixpath.join(user.pw_dir, '.ssh', 'known_hosts')
AttributeError: 'NoneType' object has no attribute 'pw_dir'
>>> Sleeping...(looping again in 60 secs)
Received on Sun Jul 20 2014 - 16:08:23 EDT
This archive was generated by
hypermail 2.3.0.