StarCluster - Mailing List Archive

reboot crash report

From: Ryan Golhar <no email>
Date: Tue, 23 Jul 2013 14:04:20 -0400

[ec2-user_at_ip-xxxxxxxx ~]$ starcluster reboot ngscluster
StarCluster - (http://star.mit.edu/cluster) (v. 0.9999)
Software Tools for Academics and Researchers (STAR)
Please submit bug reports to starcluster_at_mit.edu

>>> Running plugin setupuserenv.SetupUserEnvironment
>>> Running plugin starcluster.plugins.users.CreateUsers
>>> Running plugin starcluster.plugins.sge.SGEPlugin
>>> Running plugin starcluster.clustersetup.DefaultClusterSetup
>>> Rebooting cluster...
>>> Sleeping for 20 seconds...
>>> Waiting for cluster to come up... (updating every 30s)
>>> Waiting for all nodes to be in a 'running' state...
21/21 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
100%
>>> Waiting for SSH to come up on all nodes...
21/21 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
100%
>>> Waiting for cluster to come up took 3.078 mins
>>> The master node is xxxxxxxxxx.compute-1.amazonaws.com
>>> Configuring cluster...
>>> Volume vol-xxxxxxx already attached to master...skipping
>>> Running plugin starcluster.clustersetup.DefaultClusterSetup
>>> Configuring hostnames...
21/21 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
100%
>>> Mounting EBS volume vol-xxxxx on /share/ngs...
>>> Creating cluster user: sgeadmin (uid: 1001, gid: 1001)
21/21 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
100%
>>> Configuring scratch space for user(s): sgeadmin
21/21 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
100%
>>> Configuring /etc/hosts on each node
21/21 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
100%
>>> Starting NFS server on master
>>> Configuring NFS exports path(s):
/home /share/ngs
>>> Mounting all NFS export path(s) on 20 worker node(s)
20/20 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
100%
!!! ERROR - Error occured while running plugin
'starcluster.clustersetup.DefaultClusterSetup':
!!! ERROR - error occurred in job (id=node015): remote command 'source
/etc/profile && mount /home' failed with status 32:
mount.nfs: mount to NFS server 'master:/home' failed: timed out, giving up
Traceback (most recent call last):
  File
"/usr/lib/python2.6/site-packages/StarCluster-0.9999-py2.6.egg/starcluster/threadpool.py",
line 31, in run
    job.run()
  File
"/usr/lib/python2.6/site-packages/StarCluster-0.9999-py2.6.egg/starcluster/threadpool.py",
line 58, in run
    r = self.method(*self.args, **self.kwargs)
  File
"/usr/lib/python2.6/site-packages/StarCluster-0.9999-py2.6.egg/starcluster/node.py",
line 719, in mount_nfs_shares
    self.ssh.execute('mount %s' % path)
  File
"/usr/lib/python2.6/site-packages/StarCluster-0.9999-py2.6.egg/starcluster/sshutils/__init__.py",
line 538, in execute
    msg, command, exit_status, out_str)
RemoteCommandFailed: remote command 'source /etc/profile && mount /home'
failed with status 32:
mount.nfs: mount to NFS server 'master:/home' failed: timed out, giving up


!!! ERROR - Oops! Looks like you've found a bug in StarCluster
!!! ERROR - Crash report written to:
/home/ec2-user/.starcluster/logs/crash-report-18615.txt
!!! ERROR - Please remove any sensitive data from the crash report
!!! ERROR - and submit it to starcluster_at_mit.edu
[ec2-user_at_ip-10-28-206-211 ~]$


 crash-report-18615.txt<https://docs.google.com/file/d/0B4m4P8rKlwW2VW50TzV2Tl9JVzA/edit?usp=drive_web>
Received on Tue Jul 23 2013 - 14:04:22 EDT
This archive was generated by hypermail 2.3.0.

Search:

Sort all by:

Date

Month

Thread

Author

Subject