StarCluster - Mailing List Archive

Fwd: Recover cluster after an error when starting

From: Milton Pividori <no email>
Date: Tue, 5 Nov 2013 15:03:55 -0200

Sorry, I forgot to include the list.

---------- Forwarded message ----------
From: Milton Pividori <miltondp_at_gmail.com>
Date: 2013/11/5
Subject: Re: [StarCluster] Recover cluster after an error when starting
To: "MacMullan, Hugh" <hughmac_at_wharton.upenn.edu>


Thank you Hugh, I just discovered what "restart" does. I will try it next
time.

However, what I did now was to increase the timeout for mount in the
file starcluster/node.py, in the mount_nfs_shares function, line 725 (I am
using StarCluster 0.94.2). I added the option "timeo=20", and it worked.

Maybe it would be good to have a "timeout" option in the config file.

Thank you again!


2013/11/5 MacMullan, Hugh <hughmac_at_wharton.upenn.edu>

  Hi Milton:
>
>
>
> I would generally do a restart (starcluster restart mycluster).
>
>
>
> -Hugh
>
>
>
> *From:* starcluster-bounces_at_mit.edu [mailto:starcluster-bounces_at_mit.edu] *On
> Behalf Of *Milton Pividori
> *Sent:* Tuesday, November 05, 2013 11:50 AM
> *To:* starcluster_at_mit.edu
> *Subject:* [StarCluster] Recover cluster after an error when starting
>
>
>
> Hi all,
>
>
>
> I am a new user of StarCluster. First of all, thank you for this great
> software!
>
>
>
> My question is about how to recover a cluster when there was an error in
> starting it. After I ran "starcluster start mycluster" I got a timeout
> error when mounting the /home directory (EBS volume). Is it possible to run
> the plugin again? In this case, I think the plugin is
> "starcluster.clustersetup.DefaultClusterSetup".
>
>
>
> This is the last part of the error I get (the cluster size is 10 with
> t1.micro):
>
>
>
> >>> Starting NFS server on master
>
> >>> Configuring NFS exports path(s):
>
> /home
>
> >>> Mounting all NFS export path(s) on 9 worker node(s)
>
> 9/9 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
> 100%
>
> !!! ERROR - Error occured while running plugin
> 'starcluster.clustersetup.DefaultClusterSetup':
>
> !!! ERROR - error occurred in job (id=node009): remote command 'source
> /etc/profile && mount /home' failed with status 32:
>
> mount.nfs: mount to NFS server 'master:/home' failed: timed out, giving up
>
> Traceback (most recent call last):
>
> File
> "/usr/local/lib/python2.7/dist-packages/StarCluster-0.94.2-py2.7.egg/starcluster/threadpool.py",
> line 48, in run
>
> job.run()
>
> File
> "/usr/local/lib/python2.7/dist-packages/StarCluster-0.94.2-py2.7.egg/starcluster/threadpool.py",
> line 75, in run
>
> r = self.method(*self.args, **self.kwargs)
>
> File
> "/usr/local/lib/python2.7/dist-packages/StarCluster-0.94.2-py2.7.egg/starcluster/node.py",
> line 731, in mount_nfs_shares
>
> self.ssh.execute('mount %s' % path)
>
> File
> "/usr/local/lib/python2.7/dist-packages/StarCluster-0.94.2-py2.7.egg/starcluster/sshutils/__init__.py",
> line 555, in execute
>
> msg, command, exit_status, out_str)
>
> RemoteCommandFailed: remote command 'source /etc/profile && mount /home'
> failed with status 32:
>
> mount.nfs: mount to NFS server 'master:/home' failed: timed out, giving up
>
>
>
> Thank you!
>
>
>
> --
> Milton Pividori
> Blog: www.miltonpividori.com.ar
>



-- 
Milton Pividori
Blog: www.miltonpividori.com.ar
-- 
Milton Pividori
Blog: www.miltonpividori.com.ar
Received on Tue Nov 05 2013 - 12:04:18 EST
This archive was generated by hypermail 2.3.0.

Search:

Sort all by:

Date

Month

Thread

Author

Subject