StarCluster - Mailing List Archive

bug report regarding NFS mounting on "addnode"

From: Nik Krumm <no email>
Date: Wed, 9 Oct 2013 15:16:02 -0700

Hi Justin,

Just ran into the following bug, which happened after I tried to "starcluster addnode" to a cluster. The only thing to note is that the previous nodes on the cluster were all terminated due to being outbid, so perhaps that is playing a role?

The crash log is attached. Let me know if I can help.

Thanks
Nik



---------- SYSTEM INFO ----------
StarCluster: 0.9999
Python: 2.7.3 |EPD_free 7.3-1 (32-bit)| (default, Apr 12 2012, 11:28:34) [GCC 4.0.1 (Apple Inc. build 5493)]
Platform: Darwin-10.8.0-i386-32bit
boto: 2.10.0
paramiko: 1.11.0
Crypto: 2.6

---------- CRASH DETAILS ----------
Command: starcluster addnode test

2013-10-09 14:39:47,035 PID: 89616 config.py:567 - DEBUG - Loading config
2013-10-09 14:39:47,035 PID: 89616 config.py:138 - DEBUG - Loading file: /Users/nkrumm/.starcluster/config
2013-10-09 14:39:47,043 PID: 89616 awsutils.py:74 - DEBUG - creating self._conn w/ connection_authenticator kwargs = {'proxy_user': None, 'proxy_pass': None, 'proxy_port': None, 'proxy': None, 'is_secure': True, 'path': '/', 'region': None, 'validate_certs': True, 'port': None}
2013-10-09 14:39:47,778 PID: 89616 cluster.py:711 - DEBUG - existing nodes: {}
2013-10-09 14:39:47,778 PID: 89616 cluster.py:719 - DEBUG - adding node i-94fff1ef to self._nodes list
2013-10-09 14:39:47,779 PID: 89616 cluster.py:727 - DEBUG - returning self._nodes = [<Node: master (i-94fff1ef)>]
2013-10-09 14:39:47,779 PID: 89616 awsutils.py:74 - DEBUG - creating self._conn w/ connection_authenticator kwargs = {'proxy_user': None, 'proxy_pass': None, 'proxy_port': None, 'proxy': None, 'is_secure': True, 'path': '/', 'region': None, 'validate_certs': True, 'port': None}
2013-10-09 14:39:48,551 PID: 89616 cluster.py:711 - DEBUG - existing nodes: {u'i-94fff1ef': <Node: master (i-94fff1ef)>}
2013-10-09 14:39:48,552 PID: 89616 cluster.py:714 - DEBUG - updating existing node i-94fff1ef in self._nodes
2013-10-09 14:39:48,552 PID: 89616 cluster.py:727 - DEBUG - returning self._nodes = [<Node: master (i-94fff1ef)>]
2013-10-09 14:39:48,693 PID: 89616 cluster.py:711 - DEBUG - existing nodes: {u'i-94fff1ef': <Node: master (i-94fff1ef)>}
2013-10-09 14:39:48,693 PID: 89616 cluster.py:714 - DEBUG - updating existing node i-94fff1ef in self._nodes
2013-10-09 14:39:48,693 PID: 89616 cluster.py:727 - DEBUG - returning self._nodes = [<Node: master (i-94fff1ef)>]
2013-10-09 14:39:48,693 PID: 89616 cluster.py:860 - DEBUG - Highest node number is 0. choosing 1.
2013-10-09 14:39:48,693 PID: 89616 cluster.py:902 - INFO - Launching node(s): node001
2013-10-09 14:39:48,699 PID: 89616 cluster.py:801 - DEBUG - Userdata size in KB: 0.55
2013-10-09 14:39:49,081 PID: 89616 cluster.py:801 - DEBUG - Userdata size in KB: 0.55
2013-10-09 14:39:49,469 PID: 89616 cluster.py:847 - INFO - SpotInstanceRequest:sir-14231032
2013-10-09 14:39:49,469 PID: 89616 utils.py:624 - INFO - Waiting for spot requests to propagate...
2013-10-09 14:39:50,659 PID: 89616 cluster.py:1298 - INFO - Waiting for node(s) to come up... (updating every 30s)
2013-10-09 14:39:50,790 PID: 89616 cluster.py:1235 - INFO - Waiting for open spot requests to become active...
2013-10-09 14:41:21,246 PID: 89616 utils.py:624 - INFO - Waiting for instances to propagate...
2013-10-09 14:41:22,518 PID: 89616 cluster.py:1255 - INFO - Waiting for all nodes to be in a 'running' state...
2013-10-09 14:41:22,651 PID: 89616 cluster.py:711 - DEBUG - existing nodes: {u'i-94fff1ef': <Node: master (i-94fff1ef)>}
2013-10-09 14:41:22,651 PID: 89616 cluster.py:714 - DEBUG - updating existing node i-94fff1ef in self._nodes
2013-10-09 14:41:22,651 PID: 89616 cluster.py:719 - DEBUG - adding node i-b13fc6c9 to self._nodes list
2013-10-09 14:41:22,651 PID: 89616 awsutils.py:74 - DEBUG - creating self._conn w/ connection_authenticator kwargs = {'proxy_user': None, 'proxy_pass': None, 'proxy_port': None, 'proxy': None, 'is_secure': True, 'path': '/', 'region': None, 'validate_certs': True, 'port': None}
2013-10-09 14:41:23,565 PID: 89616 cluster.py:727 - DEBUG - returning self._nodes = [<Node: master (i-94fff1ef)>, <Node: node001 (i-b13fc6c9)>]
2013-10-09 14:41:54,151 PID: 89616 cluster.py:711 - DEBUG - existing nodes: {u'i-94fff1ef': <Node: master (i-94fff1ef)>, u'i-b13fc6c9': <Node: node001 (i-b13fc6c9)>}
2013-10-09 14:41:54,151 PID: 89616 cluster.py:714 - DEBUG - updating existing node i-94fff1ef in self._nodes
2013-10-09 14:41:54,151 PID: 89616 cluster.py:714 - DEBUG - updating existing node i-b13fc6c9 in self._nodes
2013-10-09 14:41:54,151 PID: 89616 cluster.py:727 - DEBUG - returning self._nodes = [<Node: master (i-94fff1ef)>, <Node: node001 (i-b13fc6c9)>]
2013-10-09 14:42:24,297 PID: 89616 cluster.py:711 - DEBUG - existing nodes: {u'i-94fff1ef': <Node: master (i-94fff1ef)>, u'i-b13fc6c9': <Node: node001 (i-b13fc6c9)>}
2013-10-09 14:42:24,298 PID: 89616 cluster.py:714 - DEBUG - updating existing node i-94fff1ef in self._nodes
2013-10-09 14:42:24,298 PID: 89616 cluster.py:714 - DEBUG - updating existing node i-b13fc6c9 in self._nodes
2013-10-09 14:42:24,298 PID: 89616 cluster.py:727 - DEBUG - returning self._nodes = [<Node: master (i-94fff1ef)>, <Node: node001 (i-b13fc6c9)>]
2013-10-09 14:42:54,534 PID: 89616 cluster.py:711 - DEBUG - existing nodes: {u'i-94fff1ef': <Node: master (i-94fff1ef)>, u'i-b13fc6c9': <Node: node001 (i-b13fc6c9)>}
2013-10-09 14:42:54,534 PID: 89616 cluster.py:714 - DEBUG - updating existing node i-94fff1ef in self._nodes
2013-10-09 14:42:54,535 PID: 89616 cluster.py:714 - DEBUG - updating existing node i-b13fc6c9 in self._nodes
2013-10-09 14:42:54,535 PID: 89616 cluster.py:727 - DEBUG - returning self._nodes = [<Node: master (i-94fff1ef)>, <Node: node001 (i-b13fc6c9)>]
2013-10-09 14:43:24,828 PID: 89616 cluster.py:711 - DEBUG - existing nodes: {u'i-94fff1ef': <Node: master (i-94fff1ef)>, u'i-b13fc6c9': <Node: node001 (i-b13fc6c9)>}
2013-10-09 14:43:24,828 PID: 89616 cluster.py:714 - DEBUG - updating existing node i-94fff1ef in self._nodes
2013-10-09 14:43:24,828 PID: 89616 cluster.py:714 - DEBUG - updating existing node i-b13fc6c9 in self._nodes
2013-10-09 14:43:24,828 PID: 89616 cluster.py:727 - DEBUG - returning self._nodes = [<Node: master (i-94fff1ef)>, <Node: node001 (i-b13fc6c9)>]
2013-10-09 14:43:24,828 PID: 89616 cluster.py:1283 - INFO - Waiting for SSH to come up on all nodes...
2013-10-09 14:43:24,951 PID: 89616 cluster.py:711 - DEBUG - existing nodes: {u'i-94fff1ef': <Node: master (i-94fff1ef)>, u'i-b13fc6c9': <Node: node001 (i-b13fc6c9)>}
2013-10-09 14:43:24,952 PID: 89616 cluster.py:714 - DEBUG - updating existing node i-94fff1ef in self._nodes
2013-10-09 14:43:24,952 PID: 89616 cluster.py:714 - DEBUG - updating existing node i-b13fc6c9 in self._nodes
2013-10-09 14:43:24,952 PID: 89616 cluster.py:727 - DEBUG - returning self._nodes = [<Node: master (i-94fff1ef)>, <Node: node001 (i-b13fc6c9)>]
2013-10-09 14:43:24,958 PID: 89616 threadpool.py:168 - DEBUG - unfinished_tasks = 2
2013-10-09 14:43:25,338 PID: 89616 __init__.py:86 - DEBUG - loading private key /net/eichler/vol8/home/nkrumm/.ssh/ndar_aws_key.rsa
2013-10-09 14:43:25,340 PID: 89616 __init__.py:178 - DEBUG - Using private key /net/eichler/vol8/home/nkrumm/.ssh/ndar_aws_key.rsa (rsa)
2013-10-09 14:43:25,340 PID: 89616 __init__.py:108 - DEBUG - connecting to host ec2-54-225-32-49.compute-1.amazonaws.com on port 22 as user root
2013-10-09 14:43:25,360 PID: 89616 __init__.py:86 - DEBUG - loading private key /net/eichler/vol8/home/nkrumm/.ssh/ndar_aws_key.rsa
2013-10-09 14:43:25,361 PID: 89616 __init__.py:178 - DEBUG - Using private key /net/eichler/vol8/home/nkrumm/.ssh/ndar_aws_key.rsa (rsa)
2013-10-09 14:43:25,362 PID: 89616 __init__.py:108 - DEBUG - connecting to host ec2-54-221-172-46.compute-1.amazonaws.com on port 22 as user root
2013-10-09 14:43:25,982 PID: 89616 threadpool.py:168 - DEBUG - unfinished_tasks = 2
2013-10-09 14:43:34,706 PID: 89616 threadpool.py:168 - DEBUG - unfinished_tasks = 2
2013-10-09 14:43:43,185 PID: 89616 __init__.py:197 - DEBUG - creating sftp connection
2013-10-09 14:43:44,184 PID: 89616 threadpool.py:168 - DEBUG - unfinished_tasks = 1
2013-10-09 14:43:45,089 PID: 89616 __init__.py:197 - DEBUG - creating sftp connection
2013-10-09 14:43:45,185 PID: 89616 threadpool.py:168 - DEBUG - unfinished_tasks = 1
2013-10-09 14:43:46,185 PID: 89616 utils.py:117 - INFO - Waiting for cluster to come up took 3.925 mins
2013-10-09 14:43:46,186 PID: 89616 cluster.py:912 - DEBUG - Adding node(s): ['node001']
2013-10-09 14:43:46,313 PID: 89616 cluster.py:711 - DEBUG - existing nodes: {u'i-94fff1ef': <Node: master (i-94fff1ef)>, u'i-b13fc6c9': <Node: node001 (i-b13fc6c9)>}
2013-10-09 14:43:46,313 PID: 89616 cluster.py:714 - DEBUG - updating existing node i-94fff1ef in self._nodes
2013-10-09 14:43:46,313 PID: 89616 cluster.py:714 - DEBUG - updating existing node i-b13fc6c9 in self._nodes
2013-10-09 14:43:46,314 PID: 89616 cluster.py:727 - DEBUG - returning self._nodes = [<Node: master (i-94fff1ef)>, <Node: node001 (i-b13fc6c9)>]
2013-10-09 14:43:46,436 PID: 89616 cluster.py:711 - DEBUG - existing nodes: {u'i-94fff1ef': <Node: master (i-94fff1ef)>, u'i-b13fc6c9': <Node: node001 (i-b13fc6c9)>}
2013-10-09 14:43:46,436 PID: 89616 cluster.py:714 - DEBUG - updating existing node i-94fff1ef in self._nodes
2013-10-09 14:43:46,436 PID: 89616 cluster.py:714 - DEBUG - updating existing node i-b13fc6c9 in self._nodes
2013-10-09 14:43:46,436 PID: 89616 cluster.py:727 - DEBUG - returning self._nodes = [<Node: master (i-94fff1ef)>, <Node: node001 (i-b13fc6c9)>]
2013-10-09 14:43:46,437 PID: 89616 cluster.py:1571 - INFO - Running plugin starcluster.clustersetup.DefaultClusterSetup
2013-10-09 14:43:46,437 PID: 89616 clustersetup.py:121 - INFO - Configuring hostnames...
2013-10-09 14:43:46,443 PID: 89616 threadpool.py:168 - DEBUG - unfinished_tasks = 1
2013-10-09 14:43:46,755 PID: 89616 __init__.py:538 - DEBUG - executing remote command: source /etc/profile && hostname -F /etc/hostname
2013-10-09 14:43:46,839 PID: 89616 __init__.py:562 - DEBUG - output of 'source /etc/profile && hostname -F /etc/hostname':

2013-10-09 14:43:47,444 PID: 89616 clustersetup.py:247 - INFO - Configuring /etc/hosts on each node
2013-10-09 14:43:47,445 PID: 89616 threadpool.py:168 - DEBUG - unfinished_tasks = 2
2013-10-09 14:43:47,715 PID: 89616 __init__.py:296 - DEBUG - new /etc/hosts after removing regex (master|node001) matches:
127.0.0.1 localhost

# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts
10.232.9.232 node002

2013-10-09 14:43:47,725 PID: 89616 __init__.py:296 - DEBUG - new /etc/hosts after removing regex (master|node001) matches:
127.0.0.1 localhost

# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts

2013-10-09 14:43:48,446 PID: 89616 threadpool.py:168 - DEBUG - unfinished_tasks = 2
2013-10-09 14:43:49,447 PID: 89616 node.py:660 - INFO - Configuring NFS exports path(s):
/home /data
2013-10-09 14:43:50,018 PID: 89616 __init__.py:538 - DEBUG - executing remote command: source /etc/profile && exportfs -fra
2013-10-09 14:43:50,101 PID: 89616 __init__.py:562 - DEBUG - output of 'source /etc/profile && exportfs -fra':

2013-10-09 14:43:50,101 PID: 89616 clustersetup.py:347 - INFO - Mounting all NFS export path(s) on 1 worker node(s)
2013-10-09 14:43:50,102 PID: 89616 threadpool.py:168 - DEBUG - unfinished_tasks = 1
2013-10-09 14:43:50,203 PID: 89616 __init__.py:538 - DEBUG - executing remote command: source /etc/profile && /etc/init.d/portmap start
2013-10-09 14:43:50,329 PID: 89616 __init__.py:562 - DEBUG - output of 'source /etc/profile && /etc/init.d/portmap start':
Rather than invoking init scripts through /etc/init.d, use the service(8)
utility, e.g. service portmap start

Since the script you are attempting to invoke has been converted to an
Upstart job, you may also use the start(8) utility, e.g. start portmap
2013-10-09 14:43:50,530 PID: 89616 __init__.py:538 - DEBUG - executing remote command: source /etc/profile && mount -t devpts none /dev/pts
2013-10-09 14:43:50,613 PID: 89616 __init__.py:559 - DEBUG - (ignored) remote command 'source /etc/profile && mount -t devpts none /dev/pts' failed with status 32:
mount: none already mounted or /dev/pts busy
mount: according to mtab, devpts is already mounted on /dev/pts
2013-10-09 14:43:50,814 PID: 89616 __init__.py:538 - DEBUG - executing remote command: source /etc/profile && mount
2013-10-09 14:43:50,897 PID: 89616 __init__.py:562 - DEBUG - output of 'source /etc/profile && mount':
/dev/xvda1 on / type ext4 (rw)
proc on /proc type proc (rw,noexec,nosuid,nodev)
sysfs on /sys type sysfs (rw,noexec,nosuid,nodev)
none on /sys/fs/fuse/connections type fusectl (rw)
none on /sys/kernel/debug type debugfs (rw)
none on /sys/kernel/security type securityfs (rw)
udev on /dev type devtmpfs (rw,mode=0755)
devpts on /dev/pts type devpts (rw,noexec,nosuid,gid=5,mode=0620)
tmpfs on /run type tmpfs (rw,noexec,nosuid,size=10%,mode=0755)
none on /run/lock type tmpfs (rw,noexec,nosuid,nodev,size=5242880)
none on /run/shm type tmpfs (rw,nosuid,nodev)
binfmt_misc on /proc/sys/fs/binfmt_misc type binfmt_misc (rw,noexec,nosuid,nodev)
rpc_pipefs on /run/rpc_pipefs type rpc_pipefs (rw)
/dev/xvdb1 on /mnt type ext3 (rw,_netdev)
2013-10-09 14:43:51,103 PID: 89616 threadpool.py:168 - DEBUG - unfinished_tasks = 1
2013-10-09 14:43:51,284 PID: 89616 __init__.py:296 - DEBUG - new /etc/fstab after removing regex ( /home | /data ) matches:
LABEL=cloudimg-rootfs / ext4 defaults 0 0
/dev/xvdb1 /mnt auto defaults,nobootwait,comment=cloudconfig 0 2

2013-10-09 14:43:52,084 PID: 89616 __init__.py:538 - DEBUG - executing remote command: source /etc/profile && mount /home
2013-10-09 14:43:52,103 PID: 89616 threadpool.py:168 - DEBUG - unfinished_tasks = 1
2013-10-09 14:43:53,104 PID: 89616 threadpool.py:168 - DEBUG - unfinished_tasks = 1
2013-10-09 14:43:54,104 PID: 89616 cluster.py:1581 - ERROR - Error occured while running plugin 'starcluster.clustersetup.DefaultClusterSetup':
2013-10-09 14:43:54,104 PID: 89616 cli.py:284 - ERROR - error occurred in job (id=node001): remote command 'source /etc/profile && mount /home' failed with status 32:
mount.nfs: access denied by server while mounting master:/home
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/7.3/lib/python2.7/site-packages/StarCluster-0.9999-py2.7.egg/starcluster/threadpool.py", line 48, in run
    job.run()
  File "/Library/Frameworks/Python.framework/Versions/7.3/lib/python2.7/site-packages/StarCluster-0.9999-py2.7.egg/starcluster/threadpool.py", line 75, in run
    r = self.method(*self.args, **self.kwargs)
  File "/Library/Frameworks/Python.framework/Versions/7.3/lib/python2.7/site-packages/StarCluster-0.9999-py2.7.egg/starcluster/node.py", line 730, in mount_nfs_shares
    self.ssh.execute('mount %s' % path)
  File "/Library/Frameworks/Python.framework/Versions/7.3/lib/python2.7/site-packages/StarCluster-0.9999-py2.7.egg/starcluster/sshutils/__init__.py", line 555, in execute
    msg, command, exit_status, out_str)
RemoteCommandFailed: remote command 'source /etc/profile && mount /home' failed with status 32:
mount.nfs: access denied by server while mounting master:/home
Received on Wed Oct 09 2013 - 18:16:10 EDT
This archive was generated by hypermail 2.3.0.

Search:

Sort all by:

Date

Month

Thread

Author

Subject