Hi Folks,
I hope someone can please shed light on the following new failure mode;
crash report attached. (Btw, a prior, similar attempt to add 2 nodes to
this cluster hung slightly earlier in the NFS sharing process.)
root_at_AWS-VTMXvcl /opt/awsutils/VI-utils
# tail -f /var/log/VI-addnodes/addnode.log
StarCluster - (
http://star.mit.edu/cluster) (v. 0.94.3)
Software Tools for Academics and Researchers (STAR)
Please submit bug reports to starcluster_at_mit.edu
>>> Launching node(s): node002, node003
Reservation:r-288a114b
>>> Waiting for instances to propagate...
>>> Waiting for node(s) to come up... (updating every 30s)
>>> Waiting for all nodes to be in a 'running' state...
4/4 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
100%
>>> Waiting for SSH to come up on all nodes...
4/4 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
100%
>>> Waiting for cluster to come up took 1.739 mins
>>> Running plugin starcluster.clustersetup.DefaultClusterSetup
>>> Configuring hostnames...
1/1 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
100%
>>> Configuring /etc/hosts on each node
4/4 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
100%
>>> Configuring NFS exports path(s):
/home /usr/share/jobs/
>>> Mounting all NFS export path(s) on 1 worker node(s)
1/1 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
100%
!!! ERROR - Error occured while running plugin
'starcluster.clustersetup.DefaultClusterSetup':
!!! ERROR - error occurred in job (id=node002): remote command 'source
/etc/profile && mount /home' failed with status 32:
mount.nfs: access denied by server while mounting master:/home
Traceback (most recent call last):
File
"/usr/lib/python2.6/site-packages/StarCluster-0.94.3-py2.6.egg/starcluster/threadpool.py",
line 48, in run
job.run()
File
"/usr/lib/python2.6/site-packages/StarCluster-0.94.3-py2.6.egg/starcluster/threadpool.py",
line 75, in run
r = self.method(*self.args, **self.kwargs)
File
"/usr/lib/python2.6/site-packages/StarCluster-0.94.3-py2.6.egg/starcluster/node.py",
line 731, in mount_nfs_shares
self.ssh.execute('mount %s' % path)
File
"/usr/lib/python2.6/site-packages/StarCluster-0.94.3-py2.6.egg/starcluster/sshutils/__init__.py",
line 555, in execute
msg, command, exit_status, out_str)
RemoteCommandFailed: remote command 'source /etc/profile && mount /home'
failed with status 32:
mount.nfs: access denied by server while mounting master:/home
!!! ERROR - Oops! Looks like you've found a bug in StarCluster
!!! ERROR - Crash report written to:
/root/.starcluster/logs/crash-report-15021.txt
!!! ERROR - Please remove any sensitive data from the crash report
!!! ERROR - and submit it to starcluster_at_mit.edu
Thanks in advance for any advice, fix, workaround -- anything.
Regards,
Lyn
Received on Thu Dec 05 2013 - 21:36:25 EST