StarCluster - Mailing List Archive

Re: Recover cluster after an error when starting

From: MacMullan, Hugh <no email>
Date: Tue, 5 Nov 2013 16:55:17 +0000

Hi Milton:

I would generally do a restart (starcluster restart mycluster).

-Hugh

From: starcluster-bounces_at_mit.edu [mailto:starcluster-bounces_at_mit.edu] On Behalf Of Milton Pividori
Sent: Tuesday, November 05, 2013 11:50 AM
To: starcluster_at_mit.edu
Subject: [StarCluster] Recover cluster after an error when starting

Hi all,

I am a new user of StarCluster. First of all, thank you for this great software!

My question is about how to recover a cluster when there was an error in starting it. After I ran "starcluster start mycluster" I got a timeout error when mounting the /home directory (EBS volume). Is it possible to run the plugin again? In this case, I think the plugin is "starcluster.clustersetup.DefaultClusterSetup".

This is the last part of the error I get (the cluster size is 10 with t1.micro):

>>> Starting NFS server on master
>>> Configuring NFS exports path(s):
/home
>>> Mounting all NFS export path(s) on 9 worker node(s)
9/9 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
!!! ERROR - Error occured while running plugin 'starcluster.clustersetup.DefaultClusterSetup':
!!! ERROR - error occurred in job (id=node009): remote command 'source /etc/profile && mount /home' failed with status 32:
mount.nfs: mount to NFS server 'master:/home' failed: timed out, giving up
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/StarCluster-0.94.2-py2.7.egg/starcluster/threadpool.py", line 48, in run
    job.run()
  File "/usr/local/lib/python2.7/dist-packages/StarCluster-0.94.2-py2.7.egg/starcluster/threadpool.py", line 75, in run
    r = self.method(*self.args, **self.kwargs)
  File "/usr/local/lib/python2.7/dist-packages/StarCluster-0.94.2-py2.7.egg/starcluster/node.py", line 731, in mount_nfs_shares
    self.ssh.execute('mount %s' % path)
  File "/usr/local/lib/python2.7/dist-packages/StarCluster-0.94.2-py2.7.egg/starcluster/sshutils/__init__.py", line 555, in execute
    msg, command, exit_status, out_str)
RemoteCommandFailed: remote command 'source /etc/profile && mount /home' failed with status 32:
mount.nfs: mount to NFS server 'master:/home' failed: timed out, giving up

Thank you!

--
Milton Pividori
Blog: www.miltonpividori.com.ar<http://www.miltonpividori.com.ar>
Received on Tue Nov 05 2013 - 11:55:24 EST
This archive was generated by hypermail 2.3.0.

Search:

Sort all by:

Date

Month

Thread

Author

Subject