StarCluster - Mailing List Archive

Re: EBS volume not mounting on restart (more logs)

From: 385J <Larour,>
Date: Tue, 9 Oct 2012 16:20:33 +0000

Dear all,

I thought I would add the following debug logs to help debug my issue.
Different logs for each start, stop and restart phase are separated by two lines:

First log, when I run the command:
starcluster start issm

-----------------------------------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------------------------


2012-10-09 08:23:45,433 PID: 41555 config.py:551 - DEBUG - Loading config
2012-10-09 08:23:45,434 PID: 41555 config.py:118 - DEBUG - Loading file: /Users/larour/.starcluster/config
2012-10-09 08:23:50,995 PID: 41556 config.py:551 - DEBUG - Loading config
2012-10-09 08:23:50,995 PID: 41556 config.py:118 - DEBUG - Loading file: /Users/larour/.starcluster/config
2012-10-09 08:23:50,998 PID: 41556 awsutils.py:54 - DEBUG - creating self._conn w/ connection_authenticator kwargs = {'proxy_user': None, 'proxy_pass': None, 'proxy_port': None, 'proxy': None, 'is_secure': True, 'path': '/', 'region': RegionInfo:us-east-1, 'port': None}
2012-10-09 08:23:51,490 PID: 41556 start.py:176 - INFO - Using default cluster template: issm
2012-10-09 08:23:51,491 PID: 41556 cluster.py:1630 - INFO - Validating cluster template settings...
2012-10-09 08:23:52,839 PID: 41556 cluster.py:951 - DEBUG - Launch map: node001 (ami: ami-4583572c, type: cc2.8xlarge)...
2012-10-09 08:23:52,966 PID: 41556 cluster.py:1646 - INFO - Cluster template settings are valid
2012-10-09 08:23:52,967 PID: 41556 cluster.py:1460 - INFO - Starting cluster...
2012-10-09 08:23:52,967 PID: 41556 cluster.py:977 - INFO - Launching a 2-node cluster...
2012-10-09 08:23:52,967 PID: 41556 cluster.py:951 - DEBUG - Launch map: node001 (ami: ami-4583572c, type: cc2.8xlarge)...
2012-10-09 08:23:52,968 PID: 41556 cluster.py:1004 - DEBUG - Launching master (ami: ami-4583572c, type: cc2.8xlarge)
2012-10-09 08:23:52,968 PID: 41556 cluster.py:1004 - DEBUG - Launching node001 (ami: ami-4583572c, type: cc2.8xlarge)
2012-10-09 08:23:53,320 PID: 41556 awsutils.py:165 - INFO - Creating security group _at_sc-issm...
2012-10-09 08:23:56,152 PID: 41556 awsutils.py:273 - INFO - Creating placement group _at_sc-issm...
2012-10-09 08:23:57,974 PID: 41556 cluster.py:793 - INFO - Reservation:r-9de939fb
2012-10-09 08:23:57,975 PID: 41556 cluster.py:1267 - INFO - Waiting for cluster to come up... (updating every 10s)
2012-10-09 08:23:58,552 PID: 41556 cluster.py:672 - DEBUG - existing nodes: {}
2012-10-09 08:23:58,553 PID: 41556 cluster.py:680 - DEBUG - adding node i-321c5a4f to self._nodes list
2012-10-09 08:23:59,143 PID: 41556 cluster.py:680 - DEBUG - adding node i-2c1c5a51 to self._nodes list
2012-10-09 08:23:59,799 PID: 41556 cluster.py:688 - DEBUG - returning self._nodes = [<Node: master (i-321c5a4f)>, <Node: node001 (i-2c1c5a51)>]
2012-10-09 08:23:59,800 PID: 41556 cluster.py:1235 - INFO - Waiting for all nodes to be in a 'running' state...
2012-10-09 08:23:59,944 PID: 41556 cluster.py:672 - DEBUG - existing nodes: {u'i-2c1c5a51': <Node: node001 (i-2c1c5a51)>, u'i-321c5a4f': <Node: master (i-321c5a4f)>}
2012-10-09 08:23:59,944 PID: 41556 cluster.py:675 - DEBUG - updating existing node i-321c5a4f in self._nodes
2012-10-09 08:23:59,944 PID: 41556 cluster.py:675 - DEBUG - updating existing node i-2c1c5a51 in self._nodes
2012-10-09 08:23:59,944 PID: 41556 cluster.py:688 - DEBUG - returning self._nodes = [<Node: master (i-321c5a4f)>, <Node: node001 (i-2c1c5a51)>]
2012-10-09 08:24:10,081 PID: 41556 cluster.py:672 - DEBUG - existing nodes: {u'i-2c1c5a51': <Node: node001 (i-2c1c5a51)>, u'i-321c5a4f': <Node: master (i-321c5a4f)>}
2012-10-09 08:24:10,082 PID: 41556 cluster.py:675 - DEBUG - updating existing node i-321c5a4f in self._nodes
2012-10-09 08:24:10,082 PID: 41556 cluster.py:675 - DEBUG - updating existing node i-2c1c5a51 in self._nodes
2012-10-09 08:24:10,082 PID: 41556 cluster.py:688 - DEBUG - returning self._nodes = [<Node: master (i-321c5a4f)>, <Node: node001 (i-2c1c5a51)>]
2012-10-09 08:24:20,227 PID: 41556 cluster.py:672 - DEBUG - existing nodes: {u'i-2c1c5a51': <Node: node001 (i-2c1c5a51)>, u'i-321c5a4f': <Node: master (i-321c5a4f)>}
2012-10-09 08:24:20,227 PID: 41556 cluster.py:675 - DEBUG - updating existing node i-321c5a4f in self._nodes
2012-10-09 08:24:20,228 PID: 41556 cluster.py:675 - DEBUG - updating existing node i-2c1c5a51 in self._nodes
2012-10-09 08:24:20,228 PID: 41556 cluster.py:688 - DEBUG - returning self._nodes = [<Node: master (i-321c5a4f)>, <Node: node001 (i-2c1c5a51)>]
2012-10-09 08:24:30,389 PID: 41556 cluster.py:672 - DEBUG - existing nodes: {u'i-2c1c5a51': <Node: node001 (i-2c1c5a51)>, u'i-321c5a4f': <Node: master (i-321c5a4f)>}
2012-10-09 08:24:30,390 PID: 41556 cluster.py:675 - DEBUG - updating existing node i-321c5a4f in self._nodes
2012-10-09 08:24:30,390 PID: 41556 cluster.py:675 - DEBUG - updating existing node i-2c1c5a51 in self._nodes
2012-10-09 08:24:30,390 PID: 41556 cluster.py:688 - DEBUG - returning self._nodes = [<Node: master (i-321c5a4f)>, <Node: node001 (i-2c1c5a51)>]
2012-10-09 08:24:40,526 PID: 41556 cluster.py:672 - DEBUG - existing nodes: {u'i-2c1c5a51': <Node: node001 (i-2c1c5a51)>, u'i-321c5a4f': <Node: master (i-321c5a4f)>}
2012-10-09 08:24:40,526 PID: 41556 cluster.py:675 - DEBUG - updating existing node i-321c5a4f in self._nodes
2012-10-09 08:24:40,526 PID: 41556 cluster.py:675 - DEBUG - updating existing node i-2c1c5a51 in self._nodes
2012-10-09 08:24:40,527 PID: 41556 cluster.py:688 - DEBUG - returning self._nodes = [<Node: master (i-321c5a4f)>, <Node: node001 (i-2c1c5a51)>]
2012-10-09 08:24:50,658 PID: 41556 cluster.py:672 - DEBUG - existing nodes: {u'i-2c1c5a51': <Node: node001 (i-2c1c5a51)>, u'i-321c5a4f': <Node: master (i-321c5a4f)>}
2012-10-09 08:24:50,658 PID: 41556 cluster.py:675 - DEBUG - updating existing node i-321c5a4f in self._nodes
2012-10-09 08:24:50,658 PID: 41556 cluster.py:675 - DEBUG - updating existing node i-2c1c5a51 in self._nodes
2012-10-09 08:24:50,658 PID: 41556 cluster.py:688 - DEBUG - returning self._nodes = [<Node: master (i-321c5a4f)>, <Node: node001 (i-2c1c5a51)>]
2012-10-09 08:25:00,793 PID: 41556 cluster.py:672 - DEBUG - existing nodes: {u'i-2c1c5a51': <Node: node001 (i-2c1c5a51)>, u'i-321c5a4f': <Node: master (i-321c5a4f)>}
2012-10-09 08:25:00,793 PID: 41556 cluster.py:675 - DEBUG - updating existing node i-321c5a4f in self._nodes
2012-10-09 08:25:00,793 PID: 41556 cluster.py:675 - DEBUG - updating existing node i-2c1c5a51 in self._nodes
2012-10-09 08:25:00,793 PID: 41556 cluster.py:688 - DEBUG - returning self._nodes = [<Node: master (i-321c5a4f)>, <Node: node001 (i-2c1c5a51)>]
2012-10-09 08:25:10,980 PID: 41556 cluster.py:672 - DEBUG - existing nodes: {u'i-2c1c5a51': <Node: node001 (i-2c1c5a51)>, u'i-321c5a4f': <Node: master (i-321c5a4f)>}
2012-10-09 08:25:10,981 PID: 41556 cluster.py:675 - DEBUG - updating existing node i-321c5a4f in self._nodes
2012-10-09 08:25:10,981 PID: 41556 cluster.py:675 - DEBUG - updating existing node i-2c1c5a51 in self._nodes
2012-10-09 08:25:10,981 PID: 41556 cluster.py:688 - DEBUG - returning self._nodes = [<Node: master (i-321c5a4f)>, <Node: node001 (i-2c1c5a51)>]
2012-10-09 08:25:10,982 PID: 41556 cluster.py:1253 - INFO - Waiting for SSH to come up on all nodes...
2012-10-09 08:25:11,112 PID: 41556 cluster.py:672 - DEBUG - existing nodes: {u'i-2c1c5a51': <Node: node001 (i-2c1c5a51)>, u'i-321c5a4f': <Node: master (i-321c5a4f)>}
2012-10-09 08:25:11,112 PID: 41556 cluster.py:675 - DEBUG - updating existing node i-321c5a4f in self._nodes
2012-10-09 08:25:11,112 PID: 41556 cluster.py:675 - DEBUG - updating existing node i-2c1c5a51 in self._nodes
2012-10-09 08:25:11,112 PID: 41556 cluster.py:688 - DEBUG - returning self._nodes = [<Node: master (i-321c5a4f)>, <Node: node001 (i-2c1c5a51)>]
2012-10-09 08:25:11,115 PID: 41556 threadpool.py:135 - DEBUG - unfinished_tasks = 2
2012-10-09 08:25:11,262 PID: 41556 __init__.py:75 - DEBUG - loading private key /Users/larour/.ssh/ISSMStarCluster.rsa
2012-10-09 08:25:11,263 PID: 41556 __init__.py:167 - DEBUG - Using private key /Users/larour/.ssh/ISSMStarCluster.rsa (rsa)
2012-10-09 08:25:11,263 PID: 41556 __init__.py:97 - DEBUG - connecting to host ec2-23-23-41-97.compute-1.amazonaws.com on port 22 as user root
2012-10-09 08:25:11,603 PID: 41556 __init__.py:75 - DEBUG - loading private key /Users/larour/.ssh/ISSMStarCluster.rsa
2012-10-09 08:25:11,604 PID: 41556 __init__.py:167 - DEBUG - Using private key /Users/larour/.ssh/ISSMStarCluster.rsa (rsa)
2012-10-09 08:25:11,605 PID: 41556 __init__.py:97 - DEBUG - connecting to host ec2-23-20-180-11.compute-1.amazonaws.com on port 22 as user root
2012-10-09 08:25:12,117 PID: 41556 threadpool.py:135 - DEBUG - unfinished_tasks = 2
2012-10-09 08:25:13,119 PID: 41556 threadpool.py:135 - DEBUG - unfinished_tasks = 2
2012-10-09 08:25:14,119 PID: 41556 threadpool.py:135 - DEBUG - unfinished_tasks = 2
2012-10-09 08:25:15,121 PID: 41556 threadpool.py:135 - DEBUG - unfinished_tasks = 2
2012-10-09 08:25:16,123 PID: 41556 threadpool.py:135 - DEBUG - unfinished_tasks = 2
2012-10-09 08:25:17,124 PID: 41556 threadpool.py:135 - DEBUG - unfinished_tasks = 2
2012-10-09 08:25:18,125 PID: 41556 threadpool.py:135 - DEBUG - unfinished_tasks = 2
2012-10-09 08:25:19,127 PID: 41556 threadpool.py:135 - DEBUG - unfinished_tasks = 2
2012-10-09 08:25:20,129 PID: 41556 threadpool.py:135 - DEBUG - unfinished_tasks = 2
2012-10-09 08:25:21,130 PID: 41556 threadpool.py:135 - DEBUG - unfinished_tasks = 2
2012-10-09 08:25:22,132 PID: 41556 threadpool.py:135 - DEBUG - unfinished_tasks = 2
2012-10-09 08:25:23,133 PID: 41556 threadpool.py:135 - DEBUG - unfinished_tasks = 2
2012-10-09 08:25:24,134 PID: 41556 threadpool.py:135 - DEBUG - unfinished_tasks = 2
2012-10-09 08:25:25,135 PID: 41556 threadpool.py:135 - DEBUG - unfinished_tasks = 2
2012-10-09 08:25:26,136 PID: 41556 threadpool.py:135 - DEBUG - unfinished_tasks = 2
2012-10-09 08:25:27,137 PID: 41556 threadpool.py:135 - DEBUG - unfinished_tasks = 2
2012-10-09 08:25:28,138 PID: 41556 threadpool.py:135 - DEBUG - unfinished_tasks = 2
2012-10-09 08:25:29,139 PID: 41556 threadpool.py:135 - DEBUG - unfinished_tasks = 2
2012-10-09 08:25:30,140 PID: 41556 threadpool.py:135 - DEBUG - unfinished_tasks = 2
2012-10-09 08:25:31,142 PID: 41556 threadpool.py:135 - DEBUG - unfinished_tasks = 2
2012-10-09 08:25:32,143 PID: 41556 threadpool.py:135 - DEBUG - unfinished_tasks = 2
2012-10-09 08:25:33,144 PID: 41556 threadpool.py:135 - DEBUG - unfinished_tasks = 2
2012-10-09 08:25:34,146 PID: 41556 threadpool.py:135 - DEBUG - unfinished_tasks = 2
2012-10-09 08:25:35,148 PID: 41556 threadpool.py:135 - DEBUG - unfinished_tasks = 2
2012-10-09 08:25:36,148 PID: 41556 threadpool.py:135 - DEBUG - unfinished_tasks = 2
2012-10-09 08:25:37,150 PID: 41556 threadpool.py:135 - DEBUG - unfinished_tasks = 2
2012-10-09 08:25:38,151 PID: 41556 threadpool.py:135 - DEBUG - unfinished_tasks = 2
2012-10-09 08:25:39,152 PID: 41556 threadpool.py:135 - DEBUG - unfinished_tasks = 2
2012-10-09 08:25:40,154 PID: 41556 threadpool.py:135 - DEBUG - unfinished_tasks = 2
2012-10-09 08:25:41,155 PID: 41556 threadpool.py:135 - DEBUG - unfinished_tasks = 2
2012-10-09 08:25:42,157 PID: 41556 threadpool.py:135 - DEBUG - unfinished_tasks = 2
2012-10-09 08:25:43,158 PID: 41556 threadpool.py:135 - DEBUG - unfinished_tasks = 2
2012-10-09 08:25:44,160 PID: 41556 threadpool.py:135 - DEBUG - unfinished_tasks = 2
2012-10-09 08:25:45,161 PID: 41556 threadpool.py:135 - DEBUG - unfinished_tasks = 2
2012-10-09 08:25:46,162 PID: 41556 threadpool.py:135 - DEBUG - unfinished_tasks = 2
2012-10-09 08:25:47,164 PID: 41556 threadpool.py:135 - DEBUG - unfinished_tasks = 2
2012-10-09 08:25:48,165 PID: 41556 threadpool.py:135 - DEBUG - unfinished_tasks = 2
2012-10-09 08:25:49,166 PID: 41556 threadpool.py:135 - DEBUG - unfinished_tasks = 2
2012-10-09 08:25:50,168 PID: 41556 threadpool.py:135 - DEBUG - unfinished_tasks = 2
2012-10-09 08:25:51,169 PID: 41556 threadpool.py:135 - DEBUG - unfinished_tasks = 2
2012-10-09 08:25:51,492 PID: 41556 __init__.py:97 - DEBUG - connecting to host ec2-23-23-41-97.compute-1.amazonaws.com on port 22 as user root
2012-10-09 08:25:51,828 PID: 41556 __init__.py:97 - DEBUG - connecting to host ec2-23-20-180-11.compute-1.amazonaws.com on port 22 as user root
2012-10-09 08:25:52,170 PID: 41556 threadpool.py:135 - DEBUG - unfinished_tasks = 2
2012-10-09 08:25:53,172 PID: 41556 threadpool.py:135 - DEBUG - unfinished_tasks = 2
2012-10-09 08:25:54,173 PID: 41556 threadpool.py:135 - DEBUG - unfinished_tasks = 2
2012-10-09 08:25:55,175 PID: 41556 threadpool.py:135 - DEBUG - unfinished_tasks = 2
2012-10-09 08:25:56,177 PID: 41556 threadpool.py:135 - DEBUG - unfinished_tasks = 2
2012-10-09 08:25:57,177 PID: 41556 threadpool.py:135 - DEBUG - unfinished_tasks = 2
2012-10-09 08:25:58,178 PID: 41556 threadpool.py:135 - DEBUG - unfinished_tasks = 2
2012-10-09 08:25:59,179 PID: 41556 threadpool.py:135 - DEBUG - unfinished_tasks = 2
2012-10-09 08:26:00,181 PID: 41556 threadpool.py:135 - DEBUG - unfinished_tasks = 2
2012-10-09 08:26:01,181 PID: 41556 threadpool.py:135 - DEBUG - unfinished_tasks = 2
2012-10-09 08:26:01,699 PID: 41556 __init__.py:97 - DEBUG - connecting to host ec2-23-23-41-97.compute-1.amazonaws.com on port 22 as user root
2012-10-09 08:26:02,183 PID: 41556 threadpool.py:135 - DEBUG - unfinished_tasks = 2
2012-10-09 08:26:03,185 PID: 41556 threadpool.py:135 - DEBUG - unfinished_tasks = 2
2012-10-09 08:26:03,255 PID: 41556 __init__.py:97 - DEBUG - connecting to host ec2-23-20-180-11.compute-1.amazonaws.com on port 22 as user root
2012-10-09 08:26:04,186 PID: 41556 threadpool.py:135 - DEBUG - unfinished_tasks = 2
2012-10-09 08:26:05,188 PID: 41556 threadpool.py:135 - DEBUG - unfinished_tasks = 2
2012-10-09 08:26:06,189 PID: 41556 threadpool.py:135 - DEBUG - unfinished_tasks = 2
2012-10-09 08:26:07,190 PID: 41556 threadpool.py:135 - DEBUG - unfinished_tasks = 2
2012-10-09 08:26:08,192 PID: 41556 threadpool.py:135 - DEBUG - unfinished_tasks = 2
2012-10-09 08:26:09,194 PID: 41556 threadpool.py:135 - DEBUG - unfinished_tasks = 2
2012-10-09 08:26:10,195 PID: 41556 threadpool.py:135 - DEBUG - unfinished_tasks = 2
2012-10-09 08:26:11,197 PID: 41556 threadpool.py:135 - DEBUG - unfinished_tasks = 2
2012-10-09 08:26:11,910 PID: 41556 __init__.py:97 - DEBUG - connecting to host ec2-23-23-41-97.compute-1.amazonaws.com on port 22 as user root
2012-10-09 08:26:12,198 PID: 41556 threadpool.py:135 - DEBUG - unfinished_tasks = 2
2012-10-09 08:26:13,308 PID: 41556 threadpool.py:135 - DEBUG - unfinished_tasks = 2
2012-10-09 08:26:13,468 PID: 41556 __init__.py:97 - DEBUG - connecting to host ec2-23-20-180-11.compute-1.amazonaws.com on port 22 as user root
2012-10-09 08:26:14,236 PID: 41556 __init__.py:186 - DEBUG - creating sftp connection
2012-10-09 08:26:14,309 PID: 41556 threadpool.py:135 - DEBUG - unfinished_tasks = 2
2012-10-09 08:26:15,311 PID: 41556 threadpool.py:135 - DEBUG - unfinished_tasks = 2
2012-10-09 08:26:16,312 PID: 41556 threadpool.py:135 - DEBUG - unfinished_tasks = 2
2012-10-09 08:26:17,314 PID: 41556 threadpool.py:135 - DEBUG - unfinished_tasks = 1
2012-10-09 08:26:17,570 PID: 41556 __init__.py:186 - DEBUG - creating sftp connection
2012-10-09 08:26:18,315 PID: 41556 threadpool.py:135 - DEBUG - unfinished_tasks = 1
2012-10-09 08:26:19,316 PID: 41556 threadpool.py:135 - DEBUG - unfinished_tasks = 1
2012-10-09 08:26:20,317 PID: 41556 threadpool.py:135 - DEBUG - unfinished_tasks = 1
2012-10-09 08:26:21,319 PID: 41556 threadpool.py:135 - DEBUG - unfinished_tasks = 1
2012-10-09 08:26:22,320 PID: 41556 threadpool.py:135 - DEBUG - unfinished_tasks = 1
2012-10-09 08:26:23,321 PID: 41556 threadpool.py:135 - DEBUG - unfinished_tasks = 1
2012-10-09 08:26:24,323 PID: 41556 threadpool.py:135 - DEBUG - unfinished_tasks = 1
2012-10-09 08:26:25,324 PID: 41556 utils.py:97 - INFO - Waiting for cluster to come up took 2.456 mins
2012-10-09 08:26:25,325 PID: 41556 cluster.py:1487 - INFO - The master node is ec2-23-23-41-97.compute-1.amazonaws.com
2012-10-09 08:26:25,325 PID: 41556 cluster.py:1488 - INFO - Setting up the cluster...
2012-10-09 08:26:25,541 PID: 41556 cluster.py:1317 - INFO - Attaching volume vol-7d113b07 to master node on /dev/sdz ...
2012-10-09 08:26:25,813 PID: 41556 cluster.py:1319 - DEBUG - resp = attaching
2012-10-09 08:26:31,395 PID: 41556 cluster.py:672 - DEBUG - existing nodes: {u'i-2c1c5a51': <Node: node001 (i-2c1c5a51)>, u'i-321c5a4f': <Node: master (i-321c5a4f)>}
2012-10-09 08:26:31,395 PID: 41556 cluster.py:675 - DEBUG - updating existing node i-321c5a4f in self._nodes
2012-10-09 08:26:31,396 PID: 41556 cluster.py:675 - DEBUG - updating existing node i-2c1c5a51 in self._nodes
2012-10-09 08:26:31,396 PID: 41556 cluster.py:688 - DEBUG - returning self._nodes = [<Node: master (i-321c5a4f)>, <Node: node001 (i-2c1c5a51)>]
2012-10-09 08:26:31,396 PID: 41556 clustersetup.py:90 - INFO - Configuring hostnames...
2012-10-09 08:26:31,400 PID: 41556 threadpool.py:135 - DEBUG - unfinished_tasks = 2
2012-10-09 08:26:33,158 PID: 41556 clustersetup.py:283 - INFO - Mounting EBS volume vol-7d113b07 on /data...
2012-10-09 08:26:33,338 PID: 41556 __init__.py:539 - DEBUG - /dev/sda1 on / type ext4 (rw)
2012-10-09 08:26:33,339 PID: 41556 __init__.py:539 - DEBUG - proc on /proc type proc (rw,noexec,nosuid,nodev)
2012-10-09 08:26:33,340 PID: 41556 __init__.py:539 - DEBUG - sysfs on /sys type sysfs (rw,noexec,nosuid,nodev)
2012-10-09 08:26:33,340 PID: 41556 __init__.py:539 - DEBUG - fusectl on /sys/fs/fuse/connections type fusectl (rw)
2012-10-09 08:26:33,341 PID: 41556 __init__.py:539 - DEBUG - none on /sys/kernel/debug type debugfs (rw)
2012-10-09 08:26:33,341 PID: 41556 __init__.py:539 - DEBUG - none on /sys/kernel/security type securityfs (rw)
2012-10-09 08:26:33,341 PID: 41556 __init__.py:539 - DEBUG - udev on /dev type devtmpfs (rw,mode=0755)
2012-10-09 08:26:33,341 PID: 41556 __init__.py:539 - DEBUG - devpts on /dev/pts type devpts (rw,noexec,nosuid,gid=5,mode=0620)
2012-10-09 08:26:33,342 PID: 41556 __init__.py:539 - DEBUG - tmpfs on /run type tmpfs (rw,noexec,nosuid,size=10%,mode=0755)
2012-10-09 08:26:33,342 PID: 41556 __init__.py:539 - DEBUG - none on /run/lock type tmpfs (rw,noexec,nosuid,nodev,size=5242880)
2012-10-09 08:26:33,342 PID: 41556 __init__.py:539 - DEBUG - none on /run/shm type tmpfs (rw,nosuid,nodev)
2012-10-09 08:26:33,342 PID: 41556 __init__.py:539 - DEBUG - /dev/xvdb on /mnt type ext3 (rw)
2012-10-09 08:26:33,751 PID: 41556 __init__.py:284 - DEBUG - new /etc/fstab after removing regex ( /data ) matches:
LABEL=cloudimg-rootfs / ext4 defaults 0 0
/dev/xvdb /mnt auto defaults,nobootwait,comment=cloudconfig 0 2

2012-10-09 08:26:35,765 PID: 41556 clustersetup.py:154 - INFO - Creating cluster user: None (uid: 1001, gid: 1001)
2012-10-09 08:26:35,765 PID: 41556 threadpool.py:135 - DEBUG - unfinished_tasks = 2
2012-10-09 08:26:36,067 PID: 41556 clustersetup.py:171 - DEBUG - user issm does not exist, creating...
2012-10-09 08:26:36,069 PID: 41556 clustersetup.py:171 - DEBUG - user issm does not exist, creating...
2012-10-09 08:26:36,768 PID: 41556 threadpool.py:135 - DEBUG - unfinished_tasks = 1
2012-10-09 08:26:37,769 PID: 41556 clustersetup.py:200 - INFO - Configuring scratch space for user(s): issm
2012-10-09 08:26:37,770 PID: 41556 threadpool.py:135 - DEBUG - unfinished_tasks = 2
2012-10-09 08:26:38,773 PID: 41556 clustersetup.py:209 - INFO - Configuring /etc/hosts on each node
2012-10-09 08:26:38,774 PID: 41556 threadpool.py:135 - DEBUG - unfinished_tasks = 2
2012-10-09 08:26:39,091 PID: 41556 __init__.py:284 - DEBUG - new /etc/hosts after removing regex (master|node001) matches:
127.0.0.1 ubuntu

# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts
# Added by cloud-init
127.0.1.1 ip-10-156-164-228.ec2.internal ip-10-156-164-228

2012-10-09 08:26:39,131 PID: 41556 __init__.py:284 - DEBUG - new /etc/hosts after removing regex (master|node001) matches:
127.0.0.1 ubuntu

# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts
# Added by cloud-init
127.0.1.1 ip-10-156-165-180.ec2.internal ip-10-156-165-180

2012-10-09 08:26:39,777 PID: 41556 threadpool.py:135 - DEBUG - unfinished_tasks = 2
2012-10-09 08:26:40,778 PID: 41556 node.py:616 - INFO - Starting NFS server on master
2012-10-09 08:26:40,963 PID: 41556 __init__.py:539 - DEBUG - Rather than invoking init scripts through /etc/init.d, use the service(8)
2012-10-09 08:26:40,964 PID: 41556 __init__.py:539 - DEBUG - utility, e.g. service portmap start
2012-10-09 08:26:40,964 PID: 41556 __init__.py:539 - DEBUG -
2012-10-09 08:26:40,964 PID: 41556 __init__.py:539 - DEBUG - Since the script you are attempting to invoke has been converted to an
2012-10-09 08:26:40,964 PID: 41556 __init__.py:539 - DEBUG - Upstart job, you may also use the start(8) utility, e.g. start portmap
2012-10-09 08:26:41,248 PID: 41556 __init__.py:536 - DEBUG - command 'source /etc/profile && mount -t rpc_pipefs sunrpc /var/lib/nfs/rpc_pipefs/' failed with status 32
2012-10-09 08:26:41,249 PID: 41556 __init__.py:539 - DEBUG - mount: unknown filesystem type 'rpc_pipefs'
2012-10-09 08:26:41,913 PID: 41556 __init__.py:539 - DEBUG - * Exporting directories for NFS kernel daemon...
2012-10-09 08:26:41,914 PID: 41556 __init__.py:539 - DEBUG - ...done.
2012-10-09 08:26:41,915 PID: 41556 __init__.py:539 - DEBUG - * Starting NFS kernel daemon
2012-10-09 08:26:41,915 PID: 41556 __init__.py:539 - DEBUG - ...done.
2012-10-09 08:26:41,915 PID: 41556 __init__.py:539 - DEBUG - exportfs: scandir /etc/exports.d: No such file or directory
2012-10-09 08:26:41,916 PID: 41556 __init__.py:539 - DEBUG -
2012-10-09 08:26:42,198 PID: 41556 __init__.py:539 - DEBUG - exportfs: scandir /etc/exports.d: No such file or directory
2012-10-09 08:26:42,199 PID: 41556 __init__.py:539 - DEBUG -
2012-10-09 08:26:42,200 PID: 41556 node.py:587 - INFO - Configuring NFS exports path(s):
/home /data
2012-10-09 08:26:43,174 PID: 41556 __init__.py:539 - DEBUG - exportfs: scandir /etc/exports.d: No such file or directory
2012-10-09 08:26:43,175 PID: 41556 __init__.py:539 - DEBUG -
2012-10-09 08:26:43,175 PID: 41556 clustersetup.py:313 - INFO - Mounting all NFS export path(s) on 1 worker node(s)
2012-10-09 08:26:43,177 PID: 41556 threadpool.py:135 - DEBUG - unfinished_tasks = 1
2012-10-09 08:26:43,362 PID: 41556 __init__.py:539 - DEBUG - Rather than invoking init scripts through /etc/init.d, use the service(8)
2012-10-09 08:26:43,363 PID: 41556 __init__.py:539 - DEBUG - utility, e.g. service portmap start
2012-10-09 08:26:43,364 PID: 41556 __init__.py:539 - DEBUG -
2012-10-09 08:26:43,364 PID: 41556 __init__.py:539 - DEBUG - Since the script you are attempting to invoke has been converted to an
2012-10-09 08:26:43,364 PID: 41556 __init__.py:539 - DEBUG - Upstart job, you may also use the start(8) utility, e.g. start portmap
2012-10-09 08:26:43,647 PID: 41556 __init__.py:536 - DEBUG - command 'source /etc/profile && mount -t devpts none /dev/pts' failed with status 32
2012-10-09 08:26:43,648 PID: 41556 __init__.py:539 - DEBUG - mount: none already mounted or /dev/pts busy
2012-10-09 08:26:43,649 PID: 41556 __init__.py:539 - DEBUG - mount: according to mtab, devpts is already mounted on /dev/pts
2012-10-09 08:26:43,931 PID: 41556 __init__.py:539 - DEBUG - /dev/sda1 on / type ext4 (rw)
2012-10-09 08:26:43,932 PID: 41556 __init__.py:539 - DEBUG - proc on /proc type proc (rw,noexec,nosuid,nodev)
2012-10-09 08:26:43,933 PID: 41556 __init__.py:539 - DEBUG - sysfs on /sys type sysfs (rw,noexec,nosuid,nodev)
2012-10-09 08:26:43,933 PID: 41556 __init__.py:539 - DEBUG - fusectl on /sys/fs/fuse/connections type fusectl (rw)
2012-10-09 08:26:43,933 PID: 41556 __init__.py:539 - DEBUG - none on /sys/kernel/debug type debugfs (rw)
2012-10-09 08:26:43,934 PID: 41556 __init__.py:539 - DEBUG - none on /sys/kernel/security type securityfs (rw)
2012-10-09 08:26:43,934 PID: 41556 __init__.py:539 - DEBUG - udev on /dev type devtmpfs (rw,mode=0755)
2012-10-09 08:26:43,934 PID: 41556 __init__.py:539 - DEBUG - devpts on /dev/pts type devpts (rw,noexec,nosuid,gid=5,mode=0620)
2012-10-09 08:26:43,934 PID: 41556 __init__.py:539 - DEBUG - tmpfs on /run type tmpfs (rw,noexec,nosuid,size=10%,mode=0755)
2012-10-09 08:26:43,935 PID: 41556 __init__.py:539 - DEBUG - none on /run/lock type tmpfs (rw,noexec,nosuid,nodev,size=5242880)
2012-10-09 08:26:43,935 PID: 41556 __init__.py:539 - DEBUG - none on /run/shm type tmpfs (rw,nosuid,nodev)
2012-10-09 08:26:43,935 PID: 41556 __init__.py:539 - DEBUG - /dev/xvdb on /mnt type ext3 (rw)
2012-10-09 08:26:44,179 PID: 41556 threadpool.py:135 - DEBUG - unfinished_tasks = 1
2012-10-09 08:26:44,348 PID: 41556 __init__.py:284 - DEBUG - new /etc/fstab after removing regex ( /home | /data ) matches:
LABEL=cloudimg-rootfs / ext4 defaults 0 0
/dev/xvdb /mnt auto defaults,nobootwait,comment=cloudconfig 0 2

2012-10-09 08:26:45,181 PID: 41556 threadpool.py:135 - DEBUG - unfinished_tasks = 1
2012-10-09 08:26:46,183 PID: 41556 threadpool.py:135 - DEBUG - unfinished_tasks = 1
2012-10-09 08:26:47,185 PID: 41556 utils.py:97 - INFO - Setting up NFS took 0.107 mins
2012-10-09 08:26:47,185 PID: 41556 clustersetup.py:221 - INFO - Configuring passwordless ssh for root
2012-10-09 08:26:50,183 PID: 41556 node.py:456 - DEBUG - adding auth_key_contents
2012-10-09 08:26:54,980 PID: 41556 clustersetup.py:229 - INFO - Configuring passwordless ssh for issm
2012-10-09 08:26:57,369 PID: 41556 node.py:456 - DEBUG - adding auth_key_contents
2012-10-09 08:26:57,445 PID: 41556 node.py:464 - DEBUG - adding conn_pubkey_contents
2012-10-09 08:26:58,806 PID: 41556 threadpool.py:123 - INFO - Shutting down threads...
2012-10-09 08:26:58,810 PID: 41556 threadpool.py:135 - DEBUG - unfinished_tasks = 3
2012-10-09 08:26:59,972 PID: 41556 cluster.py:672 - DEBUG - existing nodes: {u'i-2c1c5a51': <Node: node001 (i-2c1c5a51)>, u'i-321c5a4f': <Node: master (i-321c5a4f)>}
2012-10-09 08:26:59,972 PID: 41556 cluster.py:675 - DEBUG - updating existing node i-321c5a4f in self._nodes
2012-10-09 08:26:59,973 PID: 41556 cluster.py:675 - DEBUG - updating existing node i-2c1c5a51 in self._nodes
2012-10-09 08:26:59,973 PID: 41556 cluster.py:688 - DEBUG - returning self._nodes = [<Node: master (i-321c5a4f)>, <Node: node001 (i-2c1c5a51)>]
2012-10-09 08:27:00,049 PID: 41556 sge.py:127 - INFO - Configuring SGE...
2012-10-09 08:27:13,336 PID: 41556 node.py:587 - INFO - Configuring NFS exports path(s):
/opt/sge6
2012-10-09 08:27:14,234 PID: 41556 __init__.py:539 - DEBUG - exportfs: scandir /etc/exports.d: No such file or directory
2012-10-09 08:27:14,235 PID: 41556 __init__.py:539 - DEBUG -
2012-10-09 08:27:14,236 PID: 41556 clustersetup.py:313 - INFO - Mounting all NFS export path(s) on 1 worker node(s)
2012-10-09 08:27:14,240 PID: 41556 threadpool.py:135 - DEBUG - unfinished_tasks = 1
2012-10-09 08:27:14,425 PID: 41556 __init__.py:539 - DEBUG - Rather than invoking init scripts through /etc/init.d, use the service(8)
2012-10-09 08:27:14,425 PID: 41556 __init__.py:539 - DEBUG - utility, e.g. service portmap start
2012-10-09 08:27:14,426 PID: 41556 __init__.py:539 - DEBUG -
2012-10-09 08:27:14,427 PID: 41556 __init__.py:539 - DEBUG - Since the script you are attempting to invoke has been converted to an
2012-10-09 08:27:14,427 PID: 41556 __init__.py:539 - DEBUG - Upstart job, you may also use the start(8) utility, e.g. start portmap
2012-10-09 08:27:14,711 PID: 41556 __init__.py:536 - DEBUG - command 'source /etc/profile && mount -t devpts none /dev/pts' failed with status 32
2012-10-09 08:27:14,711 PID: 41556 __init__.py:539 - DEBUG - mount: none already mounted or /dev/pts busy
2012-10-09 08:27:14,712 PID: 41556 __init__.py:539 - DEBUG - mount: according to mtab, devpts is already mounted on /dev/pts
2012-10-09 08:27:14,996 PID: 41556 __init__.py:539 - DEBUG - /dev/sda1 on / type ext4 (rw)
2012-10-09 08:27:14,997 PID: 41556 __init__.py:539 - DEBUG - proc on /proc type proc (rw,noexec,nosuid,nodev)
2012-10-09 08:27:14,998 PID: 41556 __init__.py:539 - DEBUG - sysfs on /sys type sysfs (rw,noexec,nosuid,nodev)
2012-10-09 08:27:14,998 PID: 41556 __init__.py:539 - DEBUG - fusectl on /sys/fs/fuse/connections type fusectl (rw)
2012-10-09 08:27:14,999 PID: 41556 __init__.py:539 - DEBUG - none on /sys/kernel/debug type debugfs (rw)
2012-10-09 08:27:14,999 PID: 41556 __init__.py:539 - DEBUG - none on /sys/kernel/security type securityfs (rw)
2012-10-09 08:27:14,999 PID: 41556 __init__.py:539 - DEBUG - udev on /dev type devtmpfs (rw,mode=0755)
2012-10-09 08:27:14,999 PID: 41556 __init__.py:539 - DEBUG - devpts on /dev/pts type devpts (rw,noexec,nosuid,gid=5,mode=0620)
2012-10-09 08:27:14,999 PID: 41556 __init__.py:539 - DEBUG - tmpfs on /run type tmpfs (rw,noexec,nosuid,size=10%,mode=0755)
2012-10-09 08:27:15,000 PID: 41556 __init__.py:539 - DEBUG - none on /run/lock type tmpfs (rw,noexec,nosuid,nodev,size=5242880)
2012-10-09 08:27:15,000 PID: 41556 __init__.py:539 - DEBUG - none on /run/shm type tmpfs (rw,nosuid,nodev)
2012-10-09 08:27:15,000 PID: 41556 __init__.py:539 - DEBUG - /dev/xvdb on /mnt type ext3 (rw)
2012-10-09 08:27:15,000 PID: 41556 __init__.py:539 - DEBUG - master:/home on /home type nfs (rw,nosuid,nodev,vers=3,addr=10.156.165.180,user=root)
2012-10-09 08:27:15,000 PID: 41556 __init__.py:539 - DEBUG - master:/data on /data type nfs (rw,nosuid,nodev,vers=3,addr=10.156.165.180,user=root)
2012-10-09 08:27:15,242 PID: 41556 threadpool.py:135 - DEBUG - unfinished_tasks = 1
2012-10-09 08:27:15,412 PID: 41556 __init__.py:284 - DEBUG - new /etc/fstab after removing regex ( /opt/sge6 ) matches:
LABEL=cloudimg-rootfs / ext4 defaults 0 0
/dev/xvdb /mnt auto defaults,nobootwait,comment=cloudconfig 0 2
master:/home /home nfs vers=3,user,rw,exec,noauto 0 0
master:/data /data nfs vers=3,user,rw,exec,noauto 0 0

2012-10-09 08:27:16,243 PID: 41556 threadpool.py:135 - DEBUG - unfinished_tasks = 1
2012-10-09 08:27:17,245 PID: 41556 utils.py:97 - INFO - Setting up NFS took 0.065 mins
2012-10-09 08:27:17,546 PID: 41556 sge.py:100 - INFO - Installing Sun Grid Engine...
2012-10-09 08:27:23,016 PID: 41556 __init__.py:539 - DEBUG - Reading configuration from file ./ec2_sge.conf
2012-10-09 08:27:23,016 PID: 41556 __init__.py:539 - DEBUG - [H[2JInstall log can be found in: /opt/sge6/default/common/install_logs/qmaster_install_master_2012-10-09_15:27:17.log
2012-10-09 08:27:23,017 PID: 41556 __init__.py:539 - DEBUG - Install log can be found in: /opt/sge6/default/common/install_logs/execd_install_master_2012-10-09_15:27:19.log
2012-10-09 08:27:23,386 PID: 41556 __init__.py:539 - DEBUG - linux-x64
2012-10-09 08:27:23,835 PID: 41556 __init__.py:539 - DEBUG - root_at_master modified "all.q" in cluster queue list
2012-10-09 08:27:24,124 PID: 41556 __init__.py:539 - DEBUG - adminhost "node001" already exists
2012-10-09 08:27:24,413 PID: 41556 __init__.py:539 - DEBUG - submithost "node001" already exists
2012-10-09 08:27:24,414 PID: 41556 threadpool.py:135 - DEBUG - unfinished_tasks = 1
2012-10-09 08:27:24,605 PID: 41556 __init__.py:536 - DEBUG - command 'source /etc/profile && pkill -9 sge' failed with status 1
2012-10-09 08:27:24,871 PID: 41556 __init__.py:536 - DEBUG - command 'source /etc/profile && rm /etc/init.d/sge*' failed with status 1
2012-10-09 08:27:24,872 PID: 41556 __init__.py:539 - DEBUG - rm: cannot remove `/etc/init.d/sge*': No such file or directory
2012-10-09 08:27:25,246 PID: 41556 __init__.py:539 - DEBUG - linux-x64
2012-10-09 08:27:25,417 PID: 41556 threadpool.py:135 - DEBUG - unfinished_tasks = 1
2012-10-09 08:27:26,419 PID: 41556 threadpool.py:135 - DEBUG - unfinished_tasks = 1
2012-10-09 08:27:27,420 PID: 41556 threadpool.py:135 - DEBUG - unfinished_tasks = 1
2012-10-09 08:27:28,421 PID: 41556 threadpool.py:135 - DEBUG - unfinished_tasks = 1
2012-10-09 08:27:28,572 PID: 41556 __init__.py:539 - DEBUG - Reading configuration from file ./ec2_sge.conf
2012-10-09 08:27:28,573 PID: 41556 __init__.py:539 - DEBUG - [H[2JInstall log can be found in: /opt/sge6/default/common/install_logs/execd_install_node001_2012-10-09_15:27:25.log
2012-10-09 08:27:29,611 PID: 41556 sge.py:47 - INFO - Creating SGE parallel environment 'orte'
2012-10-09 08:27:29,612 PID: 41556 threadpool.py:135 - DEBUG - unfinished_tasks = 2
2012-10-09 08:27:29,795 PID: 41556 __init__.py:539 - DEBUG - 32
2012-10-09 08:27:29,879 PID: 41556 __init__.py:539 - DEBUG - 32
2012-10-09 08:27:31,029 PID: 41556 __init__.py:539 - DEBUG - root_at_master added "orte" to parallel environment list
2012-10-09 08:27:31,029 PID: 41556 sge.py:60 - INFO - Adding parallel environment 'orte' to queue 'all.q'
2012-10-09 08:27:31,326 PID: 41556 __init__.py:539 - DEBUG - Unable to find "orte" in "pe_list" of "all.q" - Adding new element.
2012-10-09 08:27:31,326 PID: 41556 __init__.py:539 - DEBUG -
2012-10-09 08:27:31,327 PID: 41556 __init__.py:539 - DEBUG - root_at_master modified "all.q" in cluster queue list
2012-10-09 08:27:31,328 PID: 41556 threadpool.py:123 - INFO - Shutting down threads...
2012-10-09 08:27:31,332 PID: 41556 threadpool.py:135 - DEBUG - unfinished_tasks = 5
2012-10-09 08:27:32,334 PID: 41556 utils.py:97 - INFO - Configuring cluster took 1.117 mins
2012-10-09 08:27:32,335 PID: 41556 utils.py:97 - INFO - Starting cluster took 3.656 mins
2012-10-09 08:27:32,335 PID: 41556 start.py:200 - INFO -
The cluster is now ready to use. To login to the master node as root, run:

    $ starcluster sshmaster issm

If you're having issues with the cluster you can reboot the instances and completely reconfigure the cluster from scratch using:

    $ starcluster restart issm

When you're finished using the cluster and wish to terminate it and stop paying for service:

    $ starcluster terminate issm

Alternatively, if the cluster uses EBS instances, you can use the 'stop' command to shutdown all nodes and put them into a 'stopped' state preserving the EBS volumes backing the nodes:

    $ starcluster stop issm

WARNING: Any data stored in ephemeral storage (usually /mnt) will be lost!

You can activate a 'stopped' cluster by passing the -x option to the 'start' command:

    $ starcluster start -x issm

This will start all 'stopped' nodes and reconfigure the cluster.

2012-10-09 08:27:48,456 PID: 41570 config.py:551 - DEBUG - Loading config
2012-10-09 08:27:48,456 PID: 41570 config.py:118 - DEBUG - Loading file: /Users/larour/.starcluster/config
2012-10-09 08:27:48,459 PID: 41570 awsutils.py:54 - DEBUG - creating self._conn w/ connection_authenticator kwargs = {'proxy_user': None, 'proxy_pass': None, 'proxy_port': None, 'proxy': None, 'is_secure': True, 'path': '/', 'region': RegionInfo:us-east-1, 'port': None}
2012-10-09 08:27:49,212 PID: 41570 cluster.py:672 - DEBUG - existing nodes: {}
2012-10-09 08:27:49,212 PID: 41570 cluster.py:680 - DEBUG - adding node i-321c5a4f to self._nodes list
2012-10-09 08:27:49,213 PID: 41570 cluster.py:680 - DEBUG - adding node i-2c1c5a51 to self._nodes list
2012-10-09 08:27:49,213 PID: 41570 cluster.py:688 - DEBUG - returning self._nodes = [<Node: master (i-321c5a4f)>, <Node: node001 (i-2c1c5a51)>]
2012-10-09 08:27:49,634 PID: 41570 cluster.py:672 - DEBUG - existing nodes: {u'i-2c1c5a51': <Node: node001 (i-2c1c5a51)>, u'i-321c5a4f': <Node: master (i-321c5a4f)>}
2012-10-09 08:27:49,635 PID: 41570 cluster.py:675 - DEBUG - updating existing node i-321c5a4f in self._nodes
2012-10-09 08:27:49,635 PID: 41570 cluster.py:675 - DEBUG - updating existing node i-2c1c5a51 in self._nodes
2012-10-09 08:27:49,635 PID: 41570 cluster.py:688 - DEBUG - returning self._nodes = [<Node: master (i-321c5a4f)>, <Node: node001 (i-2c1c5a51)>]
2012-10-09 08:27:49,880 PID: 41570 node.py:954 - DEBUG - Using native OpenSSH client
2012-10-09 08:27:49,880 PID: 41570 node.py:965 - DEBUG - ssh_cmd: ssh -i /Users/larour/.ssh/ISSMStarCluster.rsa issm_at_ec2-23-23-41-97.compute-1.amazonaws.com




Second log, when I run the command:
starcluster stop issm

-----------------------------------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------------------------

2012-10-09 08:36:34,609 PID: 41675 config.py:551 - DEBUG - Loading config
2012-10-09 08:36:34,609 PID: 41675 config.py:118 - DEBUG - Loading file: /Users/larour/.starcluster/config
2012-10-09 08:36:34,611 PID: 41675 awsutils.py:54 - DEBUG - creating self._conn w/ connection_authenticator kwargs = {'proxy_user': None, 'proxy_pass': None, 'proxy_port': None, 'proxy': None, 'is_secure': True, 'path': '/', 'region': RegionInfo:us-east-1, 'port': None}
2012-10-09 08:36:35,530 PID: 41675 cluster.py:672 - DEBUG - existing nodes: {}
2012-10-09 08:36:35,530 PID: 41675 cluster.py:680 - DEBUG - adding node i-321c5a4f to self._nodes list
2012-10-09 08:36:35,530 PID: 41675 cluster.py:680 - DEBUG - adding node i-2c1c5a51 to self._nodes list
2012-10-09 08:36:35,530 PID: 41675 cluster.py:688 - DEBUG - returning self._nodes = [<Node: master (i-321c5a4f)>, <Node: node001 (i-2c1c5a51)>]
2012-10-09 08:36:35,837 PID: 41675 cluster.py:672 - DEBUG - existing nodes: {u'i-2c1c5a51': <Node: node001 (i-2c1c5a51)>, u'i-321c5a4f': <Node: master (i-321c5a4f)>}
2012-10-09 08:36:35,837 PID: 41675 cluster.py:675 - DEBUG - updating existing node i-321c5a4f in self._nodes
2012-10-09 08:36:35,837 PID: 41675 cluster.py:675 - DEBUG - updating existing node i-2c1c5a51 in self._nodes
2012-10-09 08:36:35,838 PID: 41675 cluster.py:688 - DEBUG - returning self._nodes = [<Node: master (i-321c5a4f)>, <Node: node001 (i-2c1c5a51)>]
2012-10-09 08:36:35,984 PID: 41675 cluster.py:672 - DEBUG - existing nodes: {u'i-2c1c5a51': <Node: node001 (i-2c1c5a51)>, u'i-321c5a4f': <Node: master (i-321c5a4f)>}
2012-10-09 08:36:35,985 PID: 41675 cluster.py:675 - DEBUG - updating existing node i-321c5a4f in self._nodes
2012-10-09 08:36:35,985 PID: 41675 cluster.py:675 - DEBUG - updating existing node i-2c1c5a51 in self._nodes
2012-10-09 08:36:35,985 PID: 41675 cluster.py:688 - DEBUG - returning self._nodes = [<Node: master (i-321c5a4f)>, <Node: node001 (i-2c1c5a51)>]
2012-10-09 08:36:41,597 PID: 41675 cluster.py:672 - DEBUG - existing nodes: {u'i-2c1c5a51': <Node: node001 (i-2c1c5a51)>, u'i-321c5a4f': <Node: master (i-321c5a4f)>}
2012-10-09 08:36:41,598 PID: 41675 cluster.py:675 - DEBUG - updating existing node i-321c5a4f in self._nodes
2012-10-09 08:36:41,598 PID: 41675 cluster.py:675 - DEBUG - updating existing node i-2c1c5a51 in self._nodes
2012-10-09 08:36:41,598 PID: 41675 cluster.py:688 - DEBUG - returning self._nodes = [<Node: master (i-321c5a4f)>, <Node: node001 (i-2c1c5a51)>]
2012-10-09 08:36:41,736 PID: 41675 cluster.py:672 - DEBUG - existing nodes: {u'i-2c1c5a51': <Node: node001 (i-2c1c5a51)>, u'i-321c5a4f': <Node: master (i-321c5a4f)>}
2012-10-09 08:36:41,737 PID: 41675 cluster.py:675 - DEBUG - updating existing node i-321c5a4f in self._nodes
2012-10-09 08:36:41,737 PID: 41675 cluster.py:675 - DEBUG - updating existing node i-2c1c5a51 in self._nodes
2012-10-09 08:36:41,737 PID: 41675 cluster.py:688 - DEBUG - returning self._nodes = [<Node: master (i-321c5a4f)>, <Node: node001 (i-2c1c5a51)>]
2012-10-09 08:36:41,873 PID: 41675 cluster.py:672 - DEBUG - existing nodes: {u'i-2c1c5a51': <Node: node001 (i-2c1c5a51)>, u'i-321c5a4f': <Node: master (i-321c5a4f)>}
2012-10-09 08:36:41,874 PID: 41675 cluster.py:675 - DEBUG - updating existing node i-321c5a4f in self._nodes
2012-10-09 08:36:41,874 PID: 41675 cluster.py:675 - DEBUG - updating existing node i-2c1c5a51 in self._nodes
2012-10-09 08:36:41,874 PID: 41675 cluster.py:688 - DEBUG - returning self._nodes = [<Node: master (i-321c5a4f)>, <Node: node001 (i-2c1c5a51)>]
2012-10-09 08:36:42,016 PID: 41675 cluster.py:672 - DEBUG - existing nodes: {u'i-2c1c5a51': <Node: node001 (i-2c1c5a51)>, u'i-321c5a4f': <Node: master (i-321c5a4f)>}
2012-10-09 08:36:42,017 PID: 41675 cluster.py:675 - DEBUG - updating existing node i-321c5a4f in self._nodes
2012-10-09 08:36:42,017 PID: 41675 cluster.py:675 - DEBUG - updating existing node i-2c1c5a51 in self._nodes
2012-10-09 08:36:42,017 PID: 41675 cluster.py:688 - DEBUG - returning self._nodes = [<Node: master (i-321c5a4f)>, <Node: node001 (i-2c1c5a51)>]
2012-10-09 08:36:42,593 PID: 41675 node.py:772 - INFO - Detaching volume vol-7d113b07 from master
2012-10-09 08:36:42,809 PID: 41675 node.py:856 - INFO - Stopping node: master (i-321c5a4f)
2012-10-09 08:36:43,201 PID: 41675 node.py:856 - INFO - Stopping node: node001 (i-2c1c5a51)
2012-10-09 08:36:43,405 PID: 41675 stop.py:89 - WARNING - All non-spot, EBS-backed nodes are now in a 'stopped' state
2012-10-09 08:36:43,406 PID: 41675 stop.py:91 - WARNING - You can restart this cluster by passing -x to the 'start' command
2012-10-09 08:36:43,406 PID: 41675 stop.py:93 - WARNING - Use the 'terminate' command to *completely* terminate this cluster


Third log, when I run the command:
starcluster start –x issm

-----------------------------------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------------------------

2012-10-09 08:41:14,080 PID: 41718 config.py:551 - DEBUG - Loading config
2012-10-09 08:41:14,080 PID: 41718 config.py:118 - DEBUG - Loading file: /Users/larour/.starcluster/config
2012-10-09 08:41:14,083 PID: 41718 awsutils.py:54 - DEBUG - creating self._conn w/ connection_authenticator kwargs = {'proxy_user': None, 'proxy_pass': None, 'proxy_port': None, 'proxy': None, 'is_secure': True, 'path': '/', 'region': RegionInfo:us-east-1, 'port': None}
2012-10-09 08:41:14,881 PID: 41718 cluster.py:672 - DEBUG - existing nodes: {}
2012-10-09 08:41:14,881 PID: 41718 cluster.py:680 - DEBUG - adding node i-321c5a4f to self._nodes list
2012-10-09 08:41:14,881 PID: 41718 cluster.py:680 - DEBUG - adding node i-2c1c5a51 to self._nodes list
2012-10-09 08:41:14,881 PID: 41718 cluster.py:688 - DEBUG - returning self._nodes = [<Node: master (i-321c5a4f)>, <Node: node001 (i-2c1c5a51)>]
2012-10-09 08:41:15,351 PID: 41718 cluster.py:672 - DEBUG - existing nodes: {u'i-2c1c5a51': <Node: node001 (i-2c1c5a51)>, u'i-321c5a4f': <Node: master (i-321c5a4f)>}
2012-10-09 08:41:15,351 PID: 41718 cluster.py:675 - DEBUG - updating existing node i-321c5a4f in self._nodes
2012-10-09 08:41:15,351 PID: 41718 cluster.py:675 - DEBUG - updating existing node i-2c1c5a51 in self._nodes
2012-10-09 08:41:15,352 PID: 41718 cluster.py:688 - DEBUG - returning self._nodes = [<Node: master (i-321c5a4f)>, <Node: node001 (i-2c1c5a51)>]
2012-10-09 08:41:15,352 PID: 41718 cluster.py:1611 - INFO - Validating existing instances...
2012-10-09 08:41:15,352 PID: 41718 cluster.py:1630 - INFO - Validating cluster template settings...
2012-10-09 08:41:16,062 PID: 41718 cluster.py:951 - DEBUG - Launch map: node001 (ami: ami-4583572c, type: cc2.8xlarge)...
2012-10-09 08:41:16,196 PID: 41718 cluster.py:1646 - INFO - Cluster template settings are valid
2012-10-09 08:41:16,196 PID: 41718 cluster.py:1460 - INFO - Starting cluster...
2012-10-09 08:41:16,348 PID: 41718 cluster.py:672 - DEBUG - existing nodes: {u'i-2c1c5a51': <Node: node001 (i-2c1c5a51)>, u'i-321c5a4f': <Node: master (i-321c5a4f)>}
2012-10-09 08:41:16,349 PID: 41718 cluster.py:675 - DEBUG - updating existing node i-321c5a4f in self._nodes
2012-10-09 08:41:16,349 PID: 41718 cluster.py:675 - DEBUG - updating existing node i-2c1c5a51 in self._nodes
2012-10-09 08:41:16,349 PID: 41718 cluster.py:688 - DEBUG - returning self._nodes = [<Node: master (i-321c5a4f)>, <Node: node001 (i-2c1c5a51)>]
2012-10-09 08:41:16,350 PID: 41718 cluster.py:1466 - INFO - Starting stopped node: node001
2012-10-09 08:41:17,675 PID: 41718 cluster.py:1267 - INFO - Waiting for cluster to come up... (updating every 10s)
2012-10-09 08:41:18,005 PID: 41718 cluster.py:672 - DEBUG - existing nodes: {u'i-2c1c5a51': <Node: node001 (i-2c1c5a51)>, u'i-321c5a4f': <Node: master (i-321c5a4f)>}
2012-10-09 08:41:18,005 PID: 41718 cluster.py:675 - DEBUG - updating existing node i-321c5a4f in self._nodes
2012-10-09 08:41:18,005 PID: 41718 cluster.py:675 - DEBUG - updating existing node i-2c1c5a51 in self._nodes
2012-10-09 08:41:18,005 PID: 41718 cluster.py:688 - DEBUG - returning self._nodes = [<Node: master (i-321c5a4f)>, <Node: node001 (i-2c1c5a51)>]
2012-10-09 08:41:18,006 PID: 41718 cluster.py:1235 - INFO - Waiting for all nodes to be in a 'running' state...
2012-10-09 08:41:18,144 PID: 41718 cluster.py:672 - DEBUG - existing nodes: {u'i-2c1c5a51': <Node: node001 (i-2c1c5a51)>, u'i-321c5a4f': <Node: master (i-321c5a4f)>}
2012-10-09 08:41:18,145 PID: 41718 cluster.py:675 - DEBUG - updating existing node i-321c5a4f in self._nodes
2012-10-09 08:41:18,145 PID: 41718 cluster.py:675 - DEBUG - updating existing node i-2c1c5a51 in self._nodes
2012-10-09 08:41:18,145 PID: 41718 cluster.py:688 - DEBUG - returning self._nodes = [<Node: master (i-321c5a4f)>, <Node: node001 (i-2c1c5a51)>]
2012-10-09 08:41:28,284 PID: 41718 cluster.py:672 - DEBUG - existing nodes: {u'i-2c1c5a51': <Node: node001 (i-2c1c5a51)>, u'i-321c5a4f': <Node: master (i-321c5a4f)>}
2012-10-09 08:41:28,284 PID: 41718 cluster.py:675 - DEBUG - updating existing node i-321c5a4f in self._nodes
2012-10-09 08:41:28,285 PID: 41718 cluster.py:675 - DEBUG - updating existing node i-2c1c5a51 in self._nodes
2012-10-09 08:41:28,285 PID: 41718 cluster.py:688 - DEBUG - returning self._nodes = [<Node: master (i-321c5a4f)>, <Node: node001 (i-2c1c5a51)>]
2012-10-09 08:41:38,430 PID: 41718 cluster.py:672 - DEBUG - existing nodes: {u'i-2c1c5a51': <Node: node001 (i-2c1c5a51)>, u'i-321c5a4f': <Node: master (i-321c5a4f)>}
2012-10-09 08:41:38,430 PID: 41718 cluster.py:675 - DEBUG - updating existing node i-321c5a4f in self._nodes
2012-10-09 08:41:38,430 PID: 41718 cluster.py:675 - DEBUG - updating existing node i-2c1c5a51 in self._nodes
2012-10-09 08:41:38,430 PID: 41718 cluster.py:688 - DEBUG - returning self._nodes = [<Node: master (i-321c5a4f)>, <Node: node001 (i-2c1c5a51)>]
2012-10-09 08:41:48,570 PID: 41718 cluster.py:672 - DEBUG - existing nodes: {u'i-2c1c5a51': <Node: node001 (i-2c1c5a51)>, u'i-321c5a4f': <Node: master (i-321c5a4f)>}
2012-10-09 08:41:48,570 PID: 41718 cluster.py:675 - DEBUG - updating existing node i-321c5a4f in self._nodes
2012-10-09 08:41:48,571 PID: 41718 cluster.py:675 - DEBUG - updating existing node i-2c1c5a51 in self._nodes
2012-10-09 08:41:48,571 PID: 41718 cluster.py:688 - DEBUG - returning self._nodes = [<Node: master (i-321c5a4f)>, <Node: node001 (i-2c1c5a51)>]
2012-10-09 08:41:58,714 PID: 41718 cluster.py:672 - DEBUG - existing nodes: {u'i-2c1c5a51': <Node: node001 (i-2c1c5a51)>, u'i-321c5a4f': <Node: master (i-321c5a4f)>}
2012-10-09 08:41:58,714 PID: 41718 cluster.py:675 - DEBUG - updating existing node i-321c5a4f in self._nodes
2012-10-09 08:41:58,714 PID: 41718 cluster.py:675 - DEBUG - updating existing node i-2c1c5a51 in self._nodes
2012-10-09 08:41:58,715 PID: 41718 cluster.py:688 - DEBUG - returning self._nodes = [<Node: master (i-321c5a4f)>, <Node: node001 (i-2c1c5a51)>]
2012-10-09 08:42:08,861 PID: 41718 cluster.py:672 - DEBUG - existing nodes: {u'i-2c1c5a51': <Node: node001 (i-2c1c5a51)>, u'i-321c5a4f': <Node: master (i-321c5a4f)>}
2012-10-09 08:42:08,861 PID: 41718 cluster.py:675 - DEBUG - updating existing node i-321c5a4f in self._nodes
2012-10-09 08:42:08,861 PID: 41718 cluster.py:675 - DEBUG - updating existing node i-2c1c5a51 in self._nodes
2012-10-09 08:42:08,862 PID: 41718 cluster.py:688 - DEBUG - returning self._nodes = [<Node: master (i-321c5a4f)>, <Node: node001 (i-2c1c5a51)>]
2012-10-09 08:42:19,006 PID: 41718 cluster.py:672 - DEBUG - existing nodes: {u'i-2c1c5a51': <Node: node001 (i-2c1c5a51)>, u'i-321c5a4f': <Node: master (i-321c5a4f)>}
2012-10-09 08:42:19,007 PID: 41718 cluster.py:675 - DEBUG - updating existing node i-321c5a4f in self._nodes
2012-10-09 08:42:19,007 PID: 41718 cluster.py:675 - DEBUG - updating existing node i-2c1c5a51 in self._nodes
2012-10-09 08:42:19,007 PID: 41718 cluster.py:688 - DEBUG - returning self._nodes = [<Node: master (i-321c5a4f)>, <Node: node001 (i-2c1c5a51)>]
2012-10-09 08:42:29,165 PID: 41718 cluster.py:672 - DEBUG - existing nodes: {u'i-2c1c5a51': <Node: node001 (i-2c1c5a51)>, u'i-321c5a4f': <Node: master (i-321c5a4f)>}
2012-10-09 08:42:29,166 PID: 41718 cluster.py:675 - DEBUG - updating existing node i-321c5a4f in self._nodes
2012-10-09 08:42:29,166 PID: 41718 cluster.py:675 - DEBUG - updating existing node i-2c1c5a51 in self._nodes
2012-10-09 08:42:29,166 PID: 41718 cluster.py:688 - DEBUG - returning self._nodes = [<Node: master (i-321c5a4f)>, <Node: node001 (i-2c1c5a51)>]
2012-10-09 08:42:39,416 PID: 41718 cluster.py:672 - DEBUG - existing nodes: {u'i-2c1c5a51': <Node: node001 (i-2c1c5a51)>, u'i-321c5a4f': <Node: master (i-321c5a4f)>}
2012-10-09 08:42:39,417 PID: 41718 cluster.py:675 - DEBUG - updating existing node i-321c5a4f in self._nodes
2012-10-09 08:42:39,417 PID: 41718 cluster.py:675 - DEBUG - updating existing node i-2c1c5a51 in self._nodes
2012-10-09 08:42:39,417 PID: 41718 cluster.py:688 - DEBUG - returning self._nodes = [<Node: master (i-321c5a4f)>, <Node: node001 (i-2c1c5a51)>]
2012-10-09 08:42:39,417 PID: 41718 cluster.py:1253 - INFO - Waiting for SSH to come up on all nodes...
2012-10-09 08:42:39,557 PID: 41718 cluster.py:672 - DEBUG - existing nodes: {u'i-2c1c5a51': <Node: node001 (i-2c1c5a51)>, u'i-321c5a4f': <Node: master (i-321c5a4f)>}
2012-10-09 08:42:39,558 PID: 41718 cluster.py:675 - DEBUG - updating existing node i-321c5a4f in self._nodes
2012-10-09 08:42:39,558 PID: 41718 cluster.py:675 - DEBUG - updating existing node i-2c1c5a51 in self._nodes
2012-10-09 08:42:39,558 PID: 41718 cluster.py:688 - DEBUG - returning self._nodes = [<Node: master (i-321c5a4f)>, <Node: node001 (i-2c1c5a51)>]
2012-10-09 08:42:39,561 PID: 41718 threadpool.py:135 - DEBUG - unfinished_tasks = 2
2012-10-09 08:42:39,771 PID: 41718 __init__.py:75 - DEBUG - loading private key /Users/larour/.ssh/ISSMStarCluster.rsa
2012-10-09 08:42:39,772 PID: 41718 __init__.py:167 - DEBUG - Using private key /Users/larour/.ssh/ISSMStarCluster.rsa (rsa)
2012-10-09 08:42:39,772 PID: 41718 __init__.py:97 - DEBUG - connecting to host ec2-54-242-1-177.compute-1.amazonaws.com on port 22 as user root
2012-10-09 08:42:40,028 PID: 41718 __init__.py:75 - DEBUG - loading private key /Users/larour/.ssh/ISSMStarCluster.rsa
2012-10-09 08:42:40,028 PID: 41718 __init__.py:167 - DEBUG - Using private key /Users/larour/.ssh/ISSMStarCluster.rsa (rsa)
2012-10-09 08:42:40,029 PID: 41718 __init__.py:97 - DEBUG - connecting to host ec2-107-20-8-165.compute-1.amazonaws.com on port 22 as user root
2012-10-09 08:42:40,582 PID: 41718 threadpool.py:135 - DEBUG - unfinished_tasks = 2
2012-10-09 08:42:41,583 PID: 41718 threadpool.py:135 - DEBUG - unfinished_tasks = 2
2012-10-09 08:42:41,811 PID: 41718 __init__.py:186 - DEBUG - creating sftp connection
2012-10-09 08:42:42,584 PID: 41718 threadpool.py:135 - DEBUG - unfinished_tasks = 2
2012-10-09 08:42:43,585 PID: 41718 threadpool.py:135 - DEBUG - unfinished_tasks = 1
2012-10-09 08:42:44,587 PID: 41718 threadpool.py:135 - DEBUG - unfinished_tasks = 1
2012-10-09 08:42:45,588 PID: 41718 threadpool.py:135 - DEBUG - unfinished_tasks = 1
2012-10-09 08:42:46,590 PID: 41718 threadpool.py:135 - DEBUG - unfinished_tasks = 1
2012-10-09 08:42:47,591 PID: 41718 threadpool.py:135 - DEBUG - unfinished_tasks = 1
2012-10-09 08:42:48,593 PID: 41718 threadpool.py:135 - DEBUG - unfinished_tasks = 1
2012-10-09 08:42:49,594 PID: 41718 threadpool.py:135 - DEBUG - unfinished_tasks = 1
2012-10-09 08:42:50,595 PID: 41718 threadpool.py:135 - DEBUG - unfinished_tasks = 1
2012-10-09 08:42:51,597 PID: 41718 threadpool.py:135 - DEBUG - unfinished_tasks = 1
2012-10-09 08:42:52,598 PID: 41718 threadpool.py:135 - DEBUG - unfinished_tasks = 1
2012-10-09 08:42:53,599 PID: 41718 threadpool.py:135 - DEBUG - unfinished_tasks = 1
2012-10-09 08:42:54,601 PID: 41718 threadpool.py:135 - DEBUG - unfinished_tasks = 1
2012-10-09 08:42:55,602 PID: 41718 threadpool.py:135 - DEBUG - unfinished_tasks = 1
2012-10-09 08:42:56,603 PID: 41718 threadpool.py:135 - DEBUG - unfinished_tasks = 1
2012-10-09 08:42:57,604 PID: 41718 threadpool.py:135 - DEBUG - unfinished_tasks = 1
2012-10-09 08:42:58,606 PID: 41718 threadpool.py:135 - DEBUG - unfinished_tasks = 1
2012-10-09 08:42:59,607 PID: 41718 threadpool.py:135 - DEBUG - unfinished_tasks = 1
2012-10-09 08:43:00,609 PID: 41718 threadpool.py:135 - DEBUG - unfinished_tasks = 1
2012-10-09 08:43:01,610 PID: 41718 threadpool.py:135 - DEBUG - unfinished_tasks = 1
2012-10-09 08:43:02,612 PID: 41718 threadpool.py:135 - DEBUG - unfinished_tasks = 1
2012-10-09 08:43:03,612 PID: 41718 threadpool.py:135 - DEBUG - unfinished_tasks = 1
2012-10-09 08:43:04,613 PID: 41718 threadpool.py:135 - DEBUG - unfinished_tasks = 1
2012-10-09 08:43:05,615 PID: 41718 threadpool.py:135 - DEBUG - unfinished_tasks = 1
2012-10-09 08:43:06,616 PID: 41718 threadpool.py:135 - DEBUG - unfinished_tasks = 1
2012-10-09 08:43:07,618 PID: 41718 threadpool.py:135 - DEBUG - unfinished_tasks = 1
2012-10-09 08:43:08,619 PID: 41718 threadpool.py:135 - DEBUG - unfinished_tasks = 1
2012-10-09 08:43:09,620 PID: 41718 threadpool.py:135 - DEBUG - unfinished_tasks = 1
2012-10-09 08:43:10,622 PID: 41718 threadpool.py:135 - DEBUG - unfinished_tasks = 1
2012-10-09 08:43:11,623 PID: 41718 threadpool.py:135 - DEBUG - unfinished_tasks = 1
2012-10-09 08:43:12,625 PID: 41718 threadpool.py:135 - DEBUG - unfinished_tasks = 1
2012-10-09 08:43:13,626 PID: 41718 threadpool.py:135 - DEBUG - unfinished_tasks = 1
2012-10-09 08:43:14,627 PID: 41718 threadpool.py:135 - DEBUG - unfinished_tasks = 1
2012-10-09 08:43:15,629 PID: 41718 threadpool.py:135 - DEBUG - unfinished_tasks = 1
2012-10-09 08:43:16,630 PID: 41718 threadpool.py:135 - DEBUG - unfinished_tasks = 1
2012-10-09 08:43:17,632 PID: 41718 threadpool.py:135 - DEBUG - unfinished_tasks = 1
2012-10-09 08:43:18,634 PID: 41718 threadpool.py:135 - DEBUG - unfinished_tasks = 1
2012-10-09 08:43:19,635 PID: 41718 threadpool.py:135 - DEBUG - unfinished_tasks = 1
2012-10-09 08:43:20,192 PID: 41718 __init__.py:97 - DEBUG - connecting to host ec2-107-20-8-165.compute-1.amazonaws.com on port 22 as user root
2012-10-09 08:43:20,637 PID: 41718 threadpool.py:135 - DEBUG - unfinished_tasks = 1
2012-10-09 08:43:21,309 PID: 41718 __init__.py:186 - DEBUG - creating sftp connection
2012-10-09 08:43:21,638 PID: 41718 threadpool.py:135 - DEBUG - unfinished_tasks = 1
2012-10-09 08:43:22,640 PID: 41718 threadpool.py:135 - DEBUG - unfinished_tasks = 1
2012-10-09 08:43:23,642 PID: 41718 utils.py:97 - INFO - Waiting for cluster to come up took 2.099 mins
2012-10-09 08:43:23,642 PID: 41718 cluster.py:1487 - INFO - The master node is ec2-54-242-1-177.compute-1.amazonaws.com
2012-10-09 08:43:23,643 PID: 41718 cluster.py:1488 - INFO - Setting up the cluster...
2012-10-09 08:43:23,782 PID: 41718 cluster.py:672 - DEBUG - existing nodes: {u'i-2c1c5a51': <Node: node001 (i-2c1c5a51)>, u'i-321c5a4f': <Node: master (i-321c5a4f)>}
2012-10-09 08:43:23,782 PID: 41718 cluster.py:675 - DEBUG - updating existing node i-321c5a4f in self._nodes
2012-10-09 08:43:23,783 PID: 41718 cluster.py:675 - DEBUG - updating existing node i-2c1c5a51 in self._nodes
2012-10-09 08:43:23,783 PID: 41718 cluster.py:688 - DEBUG - returning self._nodes = [<Node: master (i-321c5a4f)>, <Node: node001 (i-2c1c5a51)>]
2012-10-09 08:43:23,783 PID: 41718 clustersetup.py:90 - INFO - Configuring hostnames...
2012-10-09 08:43:23,786 PID: 41718 threadpool.py:135 - DEBUG - unfinished_tasks = 2
2012-10-09 08:43:25,620 PID: 41718 clustersetup.py:154 - INFO - Creating cluster user: None (uid: 1001, gid: 1001)
2012-10-09 08:43:25,620 PID: 41718 threadpool.py:135 - DEBUG - unfinished_tasks = 2
2012-10-09 08:43:25,927 PID: 41718 clustersetup.py:169 - DEBUG - user issm exists on node node001, no action
2012-10-09 08:43:25,929 PID: 41718 clustersetup.py:169 - DEBUG - user issm exists on node master, no action
2012-10-09 08:43:26,623 PID: 41718 clustersetup.py:200 - INFO - Configuring scratch space for user(s): issm
2012-10-09 08:43:26,624 PID: 41718 threadpool.py:135 - DEBUG - unfinished_tasks = 2
2012-10-09 08:43:27,627 PID: 41718 clustersetup.py:209 - INFO - Configuring /etc/hosts on each node
2012-10-09 08:43:27,629 PID: 41718 threadpool.py:135 - DEBUG - unfinished_tasks = 2
2012-10-09 08:43:27,935 PID: 41718 __init__.py:284 - DEBUG - new /etc/hosts after removing regex (master|node001) matches:
127.0.0.1 ubuntu

# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts
# Added by cloud-init
127.0.1.1 ip-10-156-163-60.ec2.internal ip-10-156-163-60

2012-10-09 08:43:27,936 PID: 41718 __init__.py:284 - DEBUG - new /etc/hosts after removing regex (master|node001) matches:
127.0.0.1 ubuntu

# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts
# Added by cloud-init
127.0.1.1 ip-10-156-161-234.ec2.internal ip-10-156-161-234

2012-10-09 08:43:28,632 PID: 41718 threadpool.py:135 - DEBUG - unfinished_tasks = 2
2012-10-09 08:43:29,634 PID: 41718 node.py:616 - INFO - Starting NFS server on master
2012-10-09 08:43:29,821 PID: 41718 __init__.py:539 - DEBUG - Rather than invoking init scripts through /etc/init.d, use the service(8)
2012-10-09 08:43:29,821 PID: 41718 __init__.py:539 - DEBUG - utility, e.g. service portmap start
2012-10-09 08:43:29,822 PID: 41718 __init__.py:539 - DEBUG -
2012-10-09 08:43:29,822 PID: 41718 __init__.py:539 - DEBUG - Since the script you are attempting to invoke has been converted to an
2012-10-09 08:43:29,822 PID: 41718 __init__.py:539 - DEBUG - Upstart job, you may also use the start(8) utility, e.g. start portmap
2012-10-09 08:43:30,106 PID: 41718 __init__.py:536 - DEBUG - command 'source /etc/profile && mount -t rpc_pipefs sunrpc /var/lib/nfs/rpc_pipefs/' failed with status 32
2012-10-09 08:43:30,107 PID: 41718 __init__.py:539 - DEBUG - mount: sunrpc already mounted or /var/lib/nfs/rpc_pipefs/ busy
2012-10-09 08:43:30,108 PID: 41718 __init__.py:539 - DEBUG - mount: according to mtab, rpc_pipefs is already mounted on /var/lib/nfs/rpc_pipefs
2012-10-09 08:43:30,549 PID: 41718 __init__.py:539 - DEBUG - * Exporting directories for NFS kernel daemon...
2012-10-09 08:43:30,549 PID: 41718 __init__.py:539 - DEBUG - ...done.
2012-10-09 08:43:30,550 PID: 41718 __init__.py:539 - DEBUG - * Starting NFS kernel daemon
2012-10-09 08:43:30,551 PID: 41718 __init__.py:539 - DEBUG - ...done.
2012-10-09 08:43:30,552 PID: 41718 __init__.py:539 - DEBUG - exportfs: scandir /etc/exports.d: No such file or directory
2012-10-09 08:43:30,552 PID: 41718 __init__.py:539 - DEBUG -
2012-10-09 08:43:30,835 PID: 41718 __init__.py:539 - DEBUG - exportfs: scandir /etc/exports.d: No such file or directory
2012-10-09 08:43:30,835 PID: 41718 __init__.py:539 - DEBUG -
2012-10-09 08:43:30,836 PID: 41718 node.py:587 - INFO - Configuring NFS exports path(s):
/home
2012-10-09 08:43:31,672 PID: 41718 __init__.py:539 - DEBUG - exportfs: scandir /etc/exports.d: No such file or directory
2012-10-09 08:43:31,672 PID: 41718 __init__.py:539 - DEBUG -
2012-10-09 08:43:31,673 PID: 41718 clustersetup.py:313 - INFO - Mounting all NFS export path(s) on 1 worker node(s)
2012-10-09 08:43:31,674 PID: 41718 threadpool.py:135 - DEBUG - unfinished_tasks = 1
2012-10-09 08:43:31,861 PID: 41718 __init__.py:539 - DEBUG - Rather than invoking init scripts through /etc/init.d, use the service(8)
2012-10-09 08:43:31,862 PID: 41718 __init__.py:539 - DEBUG - utility, e.g. service portmap start
2012-10-09 08:43:31,863 PID: 41718 __init__.py:539 - DEBUG -
2012-10-09 08:43:31,863 PID: 41718 __init__.py:539 - DEBUG - Since the script you are attempting to invoke has been converted to an
2012-10-09 08:43:31,863 PID: 41718 __init__.py:539 - DEBUG - Upstart job, you may also use the start(8) utility, e.g. start portmap
2012-10-09 08:43:32,148 PID: 41718 __init__.py:536 - DEBUG - command 'source /etc/profile && mount -t devpts none /dev/pts' failed with status 32
2012-10-09 08:43:32,149 PID: 41718 __init__.py:539 - DEBUG - mount: none already mounted or /dev/pts busy
2012-10-09 08:43:32,149 PID: 41718 __init__.py:539 - DEBUG - mount: according to mtab, devpts is already mounted on /dev/pts
2012-10-09 08:43:32,433 PID: 41718 __init__.py:539 - DEBUG - /dev/sda1 on / type ext4 (rw)
2012-10-09 08:43:32,434 PID: 41718 __init__.py:539 - DEBUG - proc on /proc type proc (rw,noexec,nosuid,nodev)
2012-10-09 08:43:32,435 PID: 41718 __init__.py:539 - DEBUG - sysfs on /sys type sysfs (rw,noexec,nosuid,nodev)
2012-10-09 08:43:32,435 PID: 41718 __init__.py:539 - DEBUG - fusectl on /sys/fs/fuse/connections type fusectl (rw)
2012-10-09 08:43:32,436 PID: 41718 __init__.py:539 - DEBUG - none on /sys/kernel/debug type debugfs (rw)
2012-10-09 08:43:32,436 PID: 41718 __init__.py:539 - DEBUG - none on /sys/kernel/security type securityfs (rw)
2012-10-09 08:43:32,436 PID: 41718 __init__.py:539 - DEBUG - udev on /dev type devtmpfs (rw,mode=0755)
2012-10-09 08:43:32,437 PID: 41718 __init__.py:539 - DEBUG - devpts on /dev/pts type devpts (rw,noexec,nosuid,gid=5,mode=0620)
2012-10-09 08:43:32,437 PID: 41718 __init__.py:539 - DEBUG - tmpfs on /run type tmpfs (rw,noexec,nosuid,size=10%,mode=0755)
2012-10-09 08:43:32,437 PID: 41718 __init__.py:539 - DEBUG - none on /run/lock type tmpfs (rw,noexec,nosuid,nodev,size=5242880)
2012-10-09 08:43:32,437 PID: 41718 __init__.py:539 - DEBUG - none on /run/shm type tmpfs (rw,nosuid,nodev)
2012-10-09 08:43:32,437 PID: 41718 __init__.py:539 - DEBUG - /dev/xvdb on /mnt type ext3 (rw)
2012-10-09 08:43:32,676 PID: 41718 threadpool.py:135 - DEBUG - unfinished_tasks = 1
2012-10-09 08:43:32,859 PID: 41718 __init__.py:284 - DEBUG - new /etc/fstab after removing regex ( /home ) matches:
LABEL=cloudimg-rootfs / ext4 defaults 0 0
/dev/xvdb /mnt auto defaults,nobootwait,comment=cloudconfig 0 2
master:/data /data nfs vers=3,user,rw,exec,noauto 0 0
master:/opt/sge6 /opt/sge6 nfs vers=3,user,rw,exec,noauto 0 0

2012-10-09 08:43:33,678 PID: 41718 threadpool.py:135 - DEBUG - unfinished_tasks = 1
2012-10-09 08:43:34,680 PID: 41718 threadpool.py:135 - DEBUG - unfinished_tasks = 1
2012-10-09 08:43:35,681 PID: 41718 utils.py:97 - INFO - Setting up NFS took 0.101 mins
2012-10-09 08:43:35,682 PID: 41718 clustersetup.py:221 - INFO - Configuring passwordless ssh for root
2012-10-09 08:43:36,220 PID: 41718 node.py:427 - DEBUG - Using existing key: /root/.ssh/id_rsa
2012-10-09 08:43:38,854 PID: 41718 __init__.py:284 - DEBUG - new /root/.ssh/known_hosts after removing regex (node001|ip-10-156-161-234.ec2.internal|ip-10-156-161-234|ec2-107-20-8-165.compute-1.amazonaws.com) matches:
ip-10-156-164-228,10.156.164.228 ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDSMA2ybkEu7KcSyPEqZ1tX0lGURtqY4f0rRk40gRt7VJncMJ2qQAPusPzxfmirUwKaYWamQr/z4K/TfP/qvBwknaP8HfR40conTlSDlEUHHpJ/ziky7ztQEPmH5UiMSdElJGZoGeUCOsVneuffuXQN0sQl/1OPd2KrFKn6YdAKD9lYxQKWuXEAFus7W98rj+1gjymavktpljX7SOtRyzxAVAVqeooCEco+aF1zSc5HCqS8sYcvgTUSGuhMjyYkoStxFEmHvdGYDBetxTfvUaya3r1TNgnuX2G7Bn5mcYmFUbfTGHW7uqjzf6HCm3Lzac0i0vLi/xKKtmwP1ceoanMd
ip-10-156-164-228.ec2.internal,10.156.164.228 ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDSMA2ybkEu7KcSyPEqZ1tX0lGURtqY4f0rRk40gRt7VJncMJ2qQAPusPzxfmirUwKaYWamQr/z4K/TfP/qvBwknaP8HfR40conTlSDlEUHHpJ/ziky7ztQEPmH5UiMSdElJGZoGeUCOsVneuffuXQN0sQl/1OPd2KrFKn6YdAKD9lYxQKWuXEAFus7W98rj+1gjymavktpljX7SOtRyzxAVAVqeooCEco+aF1zSc5HCqS8sYcvgTUSGuhMjyYkoStxFEmHvdGYDBetxTfvUaya3r1TNgnuX2G7Bn5mcYmFUbfTGHW7uqjzf6HCm3Lzac0i0vLi/xKKtmwP1ceoanMd
ec2-23-20-180-11.compute-1.amazonaws.com,23.20.180.11 ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDSMA2ybkEu7KcSyPEqZ1tX0lGURtqY4f0rRk40gRt7VJncMJ2qQAPusPzxfmirUwKaYWamQr/z4K/TfP/qvBwknaP8HfR40conTlSDlEUHHpJ/ziky7ztQEPmH5UiMSdElJGZoGeUCOsVneuffuXQN0sQl/1OPd2KrFKn6YdAKD9lYxQKWuXEAFus7W98rj+1gjymavktpljX7SOtRyzxAVAVqeooCEco+aF1zSc5HCqS8sYcvgTUSGuhMjyYkoStxFEmHvdGYDBetxTfvUaya3r1TNgnuX2G7Bn5mcYmFUbfTGHW7uqjzf6HCm3Lzac0i0vLi/xKKtmwP1ceoanMd
ip-10-156-165-180.ec2.internal,10.156.165.180 ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC57A8tlD+3/LGz+ifzyHPYL4tQAZ85X+9kzlC10qfWs6isXZlb93fy4AAYWhINwYioadLdnVDhOkqzxaLSf8YDtSJnlYXHYdeS/ZOIhdCqFxchxW9gJFb5xSU30D567Xf4CpaPegtzaEpVfbdovIiLJgjIKYSf/7AoYv51XKjaurDh9CSQlsNhUbJdShQQzFIWXixfVkn+djPupqe2hwyR2Nj5+J3/+N10sg0KTxVO78Os2nyUVzg6Y8TRG8pFXiUnR4H6gJKiXwJrPB+YPIOJoqXK/5nZEa7qRh+L3BzkYbnqENLquSNWGQ3vuJls+dRQMGNKRL0z2J3i+b5sw+3t
ec2-23-23-41-97.compute-1.amazonaws.com,23.23.41.97 ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC57A8tlD+3/LGz+ifzyHPYL4tQAZ85X+9kzlC10qfWs6isXZlb93fy4AAYWhINwYioadLdnVDhOkqzxaLSf8YDtSJnlYXHYdeS/ZOIhdCqFxchxW9gJFb5xSU30D567Xf4CpaPegtzaEpVfbdovIiLJgjIKYSf/7AoYv51XKjaurDh9CSQlsNhUbJdShQQzFIWXixfVkn+djPupqe2hwyR2Nj5+J3/+N10sg0KTxVO78Os2nyUVzg6Y8TRG8pFXiUnR4H6gJKiXwJrPB+YPIOJoqXK/5nZEa7qRh+L3BzkYbnqENLquSNWGQ3vuJls+dRQMGNKRL0z2J3i+b5sw+3t
master,10.156.165.180 ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC57A8tlD+3/LGz+ifzyHPYL4tQAZ85X+9kzlC10qfWs6isXZlb93fy4AAYWhINwYioadLdnVDhOkqzxaLSf8YDtSJnlYXHYdeS/ZOIhdCqFxchxW9gJFb5xSU30D567Xf4CpaPegtzaEpVfbdovIiLJgjIKYSf/7AoYv51XKjaurDh9CSQlsNhUbJdShQQzFIWXixfVkn+djPupqe2hwyR2Nj5+J3/+N10sg0KTxVO78Os2nyUVzg6Y8TRG8pFXiUnR4H6gJKiXwJrPB+YPIOJoqXK/5nZEa7qRh+L3BzkYbnqENLquSNWGQ3vuJls+dRQMGNKRL0z2J3i+b5sw+3t
ip-10-156-165-180,10.156.165.180 ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC57A8tlD+3/LGz+ifzyHPYL4tQAZ85X+9kzlC10qfWs6isXZlb93fy4AAYWhINwYioadLdnVDhOkqzxaLSf8YDtSJnlYXHYdeS/ZOIhdCqFxchxW9gJFb5xSU30D567Xf4CpaPegtzaEpVfbdovIiLJgjIKYSf/7AoYv51XKjaurDh9CSQlsNhUbJdShQQzFIWXixfVkn+djPupqe2hwyR2Nj5+J3/+N10sg0KTxVO78Os2nyUVzg6Y8TRG8pFXiUnR4H6gJKiXwJrPB+YPIOJoqXK/5nZEa7qRh+L3BzkYbnqENLquSNWGQ3vuJls+dRQMGNKRL0z2J3i+b5sw+3t

2012-10-09 08:43:43,195 PID: 41718 clustersetup.py:229 - INFO - Configuring passwordless ssh for issm
2012-10-09 08:43:43,752 PID: 41718 node.py:427 - DEBUG - Using existing key: /home/issm/.ssh/id_rsa
2012-10-09 08:43:45,963 PID: 41718 __init__.py:284 - DEBUG - new /home/issm/.ssh/known_hosts after removing regex (node001|ip-10-156-161-234.ec2.internal|ip-10-156-161-234|ec2-107-20-8-165.compute-1.amazonaws.com|master|ip-10-156-163-60.ec2.internal|ip-10-156-163-60|ec2-54-242-1-177.compute-1.amazonaws.com) matches:
ip-10-156-164-228,10.156.164.228 ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDSMA2ybkEu7KcSyPEqZ1tX0lGURtqY4f0rRk40gRt7VJncMJ2qQAPusPzxfmirUwKaYWamQr/z4K/TfP/qvBwknaP8HfR40conTlSDlEUHHpJ/ziky7ztQEPmH5UiMSdElJGZoGeUCOsVneuffuXQN0sQl/1OPd2KrFKn6YdAKD9lYxQKWuXEAFus7W98rj+1gjymavktpljX7SOtRyzxAVAVqeooCEco+aF1zSc5HCqS8sYcvgTUSGuhMjyYkoStxFEmHvdGYDBetxTfvUaya3r1TNgnuX2G7Bn5mcYmFUbfTGHW7uqjzf6HCm3Lzac0i0vLi/xKKtmwP1ceoanMd
ip-10-156-164-228.ec2.internal,10.156.164.228 ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDSMA2ybkEu7KcSyPEqZ1tX0lGURtqY4f0rRk40gRt7VJncMJ2qQAPusPzxfmirUwKaYWamQr/z4K/TfP/qvBwknaP8HfR40conTlSDlEUHHpJ/ziky7ztQEPmH5UiMSdElJGZoGeUCOsVneuffuXQN0sQl/1OPd2KrFKn6YdAKD9lYxQKWuXEAFus7W98rj+1gjymavktpljX7SOtRyzxAVAVqeooCEco+aF1zSc5HCqS8sYcvgTUSGuhMjyYkoStxFEmHvdGYDBetxTfvUaya3r1TNgnuX2G7Bn5mcYmFUbfTGHW7uqjzf6HCm3Lzac0i0vLi/xKKtmwP1ceoanMd
ec2-23-20-180-11.compute-1.amazonaws.com,23.20.180.11 ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDSMA2ybkEu7KcSyPEqZ1tX0lGURtqY4f0rRk40gRt7VJncMJ2qQAPusPzxfmirUwKaYWamQr/z4K/TfP/qvBwknaP8HfR40conTlSDlEUHHpJ/ziky7ztQEPmH5UiMSdElJGZoGeUCOsVneuffuXQN0sQl/1OPd2KrFKn6YdAKD9lYxQKWuXEAFus7W98rj+1gjymavktpljX7SOtRyzxAVAVqeooCEco+aF1zSc5HCqS8sYcvgTUSGuhMjyYkoStxFEmHvdGYDBetxTfvUaya3r1TNgnuX2G7Bn5mcYmFUbfTGHW7uqjzf6HCm3Lzac0i0vLi/xKKtmwP1ceoanMd
ip-10-156-165-180.ec2.internal,10.156.165.180 ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC57A8tlD+3/LGz+ifzyHPYL4tQAZ85X+9kzlC10qfWs6isXZlb93fy4AAYWhINwYioadLdnVDhOkqzxaLSf8YDtSJnlYXHYdeS/ZOIhdCqFxchxW9gJFb5xSU30D567Xf4CpaPegtzaEpVfbdovIiLJgjIKYSf/7AoYv51XKjaurDh9CSQlsNhUbJdShQQzFIWXixfVkn+djPupqe2hwyR2Nj5+J3/+N10sg0KTxVO78Os2nyUVzg6Y8TRG8pFXiUnR4H6gJKiXwJrPB+YPIOJoqXK/5nZEa7qRh+L3BzkYbnqENLquSNWGQ3vuJls+dRQMGNKRL0z2J3i+b5sw+3t
ec2-23-23-41-97.compute-1.amazonaws.com,23.23.41.97 ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC57A8tlD+3/LGz+ifzyHPYL4tQAZ85X+9kzlC10qfWs6isXZlb93fy4AAYWhINwYioadLdnVDhOkqzxaLSf8YDtSJnlYXHYdeS/ZOIhdCqFxchxW9gJFb5xSU30D567Xf4CpaPegtzaEpVfbdovIiLJgjIKYSf/7AoYv51XKjaurDh9CSQlsNhUbJdShQQzFIWXixfVkn+djPupqe2hwyR2Nj5+J3/+N10sg0KTxVO78Os2nyUVzg6Y8TRG8pFXiUnR4H6gJKiXwJrPB+YPIOJoqXK/5nZEa7qRh+L3BzkYbnqENLquSNWGQ3vuJls+dRQMGNKRL0z2J3i+b5sw+3t
ip-10-156-165-180,10.156.165.180 ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC57A8tlD+3/LGz+ifzyHPYL4tQAZ85X+9kzlC10qfWs6isXZlb93fy4AAYWhINwYioadLdnVDhOkqzxaLSf8YDtSJnlYXHYdeS/ZOIhdCqFxchxW9gJFb5xSU30D567Xf4CpaPegtzaEpVfbdovIiLJgjIKYSf/7AoYv51XKjaurDh9CSQlsNhUbJdShQQzFIWXixfVkn+djPupqe2hwyR2Nj5+J3/+N10sg0KTxVO78Os2nyUVzg6Y8TRG8pFXiUnR4H6gJKiXwJrPB+YPIOJoqXK/5nZEa7qRh+L3BzkYbnqENLquSNWGQ3vuJls+dRQMGNKRL0z2J3i+b5sw+3t

2012-10-09 08:43:46,966 PID: 41718 threadpool.py:123 - INFO - Shutting down threads...
2012-10-09 08:43:46,970 PID: 41718 threadpool.py:135 - DEBUG - unfinished_tasks = 1
2012-10-09 08:43:48,130 PID: 41718 cluster.py:672 - DEBUG - existing nodes: {u'i-2c1c5a51': <Node: node001 (i-2c1c5a51)>, u'i-321c5a4f': <Node: master (i-321c5a4f)>}
2012-10-09 08:43:48,131 PID: 41718 cluster.py:675 - DEBUG - updating existing node i-321c5a4f in self._nodes
2012-10-09 08:43:48,131 PID: 41718 cluster.py:675 - DEBUG - updating existing node i-2c1c5a51 in self._nodes
2012-10-09 08:43:48,131 PID: 41718 cluster.py:688 - DEBUG - returning self._nodes = [<Node: master (i-321c5a4f)>, <Node: node001 (i-2c1c5a51)>]
2012-10-09 08:43:48,208 PID: 41718 sge.py:127 - INFO - Configuring SGE...
2012-10-09 08:43:48,285 PID: 41718 node.py:587 - INFO - Configuring NFS exports path(s):
/opt/sge6
2012-10-09 08:43:49,006 PID: 41718 __init__.py:539 - DEBUG - exportfs: scandir /etc/exports.d: No such file or directory
2012-10-09 08:43:49,007 PID: 41718 __init__.py:539 - DEBUG -
2012-10-09 08:43:49,008 PID: 41718 clustersetup.py:313 - INFO - Mounting all NFS export path(s) on 1 worker node(s)
2012-10-09 08:43:49,013 PID: 41718 threadpool.py:135 - DEBUG - unfinished_tasks = 1
2012-10-09 08:43:49,199 PID: 41718 __init__.py:539 - DEBUG - Rather than invoking init scripts through /etc/init.d, use the service(8)
2012-10-09 08:43:49,200 PID: 41718 __init__.py:539 - DEBUG - utility, e.g. service portmap start
2012-10-09 08:43:49,200 PID: 41718 __init__.py:539 - DEBUG -
2012-10-09 08:43:49,200 PID: 41718 __init__.py:539 - DEBUG - Since the script you are attempting to invoke has been converted to an
2012-10-09 08:43:49,201 PID: 41718 __init__.py:539 - DEBUG - Upstart job, you may also use the start(8) utility, e.g. start portmap
2012-10-09 08:43:49,485 PID: 41718 __init__.py:536 - DEBUG - command 'source /etc/profile && mount -t devpts none /dev/pts' failed with status 32
2012-10-09 08:43:49,486 PID: 41718 __init__.py:539 - DEBUG - mount: none already mounted or /dev/pts busy
2012-10-09 08:43:49,486 PID: 41718 __init__.py:539 - DEBUG - mount: according to mtab, devpts is already mounted on /dev/pts
2012-10-09 08:43:49,770 PID: 41718 __init__.py:539 - DEBUG - /dev/sda1 on / type ext4 (rw)
2012-10-09 08:43:49,770 PID: 41718 __init__.py:539 - DEBUG - proc on /proc type proc (rw,noexec,nosuid,nodev)
2012-10-09 08:43:49,771 PID: 41718 __init__.py:539 - DEBUG - sysfs on /sys type sysfs (rw,noexec,nosuid,nodev)
2012-10-09 08:43:49,771 PID: 41718 __init__.py:539 - DEBUG - fusectl on /sys/fs/fuse/connections type fusectl (rw)
2012-10-09 08:43:49,771 PID: 41718 __init__.py:539 - DEBUG - none on /sys/kernel/debug type debugfs (rw)
2012-10-09 08:43:49,772 PID: 41718 __init__.py:539 - DEBUG - none on /sys/kernel/security type securityfs (rw)
2012-10-09 08:43:49,772 PID: 41718 __init__.py:539 - DEBUG - udev on /dev type devtmpfs (rw,mode=0755)
2012-10-09 08:43:49,772 PID: 41718 __init__.py:539 - DEBUG - devpts on /dev/pts type devpts (rw,noexec,nosuid,gid=5,mode=0620)
2012-10-09 08:43:49,772 PID: 41718 __init__.py:539 - DEBUG - tmpfs on /run type tmpfs (rw,noexec,nosuid,size=10%,mode=0755)
2012-10-09 08:43:49,772 PID: 41718 __init__.py:539 - DEBUG - none on /run/lock type tmpfs (rw,noexec,nosuid,nodev,size=5242880)
2012-10-09 08:43:49,772 PID: 41718 __init__.py:539 - DEBUG - none on /run/shm type tmpfs (rw,nosuid,nodev)
2012-10-09 08:43:49,772 PID: 41718 __init__.py:539 - DEBUG - /dev/xvdb on /mnt type ext3 (rw)
2012-10-09 08:43:49,772 PID: 41718 __init__.py:539 - DEBUG - master:/home on /home type nfs (rw,nosuid,nodev,vers=3,addr=10.156.163.60,user=root)
2012-10-09 08:43:50,015 PID: 41718 threadpool.py:135 - DEBUG - unfinished_tasks = 1
2012-10-09 08:43:50,188 PID: 41718 __init__.py:284 - DEBUG - new /etc/fstab after removing regex ( /opt/sge6 ) matches:
LABEL=cloudimg-rootfs / ext4 defaults 0 0
/dev/xvdb /mnt auto defaults,nobootwait,comment=cloudconfig 0 2
master:/data /data nfs vers=3,user,rw,exec,noauto 0 0
master:/home /home nfs vers=3,user,rw,exec,noauto 0 0

2012-10-09 08:43:51,017 PID: 41718 threadpool.py:135 - DEBUG - unfinished_tasks = 1
2012-10-09 08:43:52,018 PID: 41718 utils.py:97 - INFO - Setting up NFS took 0.062 mins
2012-10-09 08:43:52,149 PID: 41718 sge.py:88 - INFO - Removing previous SGE installation...
2012-10-09 08:43:52,694 PID: 41718 __init__.py:539 - DEBUG - exportfs: scandir /etc/exports.d: No such file or directory
2012-10-09 08:43:52,695 PID: 41718 __init__.py:539 - DEBUG -
2012-10-09 08:43:53,041 PID: 41718 sge.py:100 - INFO - Installing Sun Grid Engine...
2012-10-09 08:44:00,615 PID: 41718 __init__.py:539 - DEBUG - Reading configuration from file ./ec2_sge.conf
2012-10-09 08:44:00,616 PID: 41718 __init__.py:539 - DEBUG - [H[2JInstall log can be found in: /opt/sge6/default/common/install_logs/qmaster_install_master_2012-10-09_15:43:53.log
2012-10-09 08:44:00,617 PID: 41718 __init__.py:539 - DEBUG - Install log can be found in: /opt/sge6/default/common/install_logs/execd_install_master_2012-10-09_15:43:57.log
2012-10-09 08:44:00,990 PID: 41718 __init__.py:539 - DEBUG - linux-x64
2012-10-09 08:44:01,450 PID: 41718 __init__.py:539 - DEBUG - root_at_master modified "all.q" in cluster queue list
2012-10-09 08:44:01,727 PID: 41718 __init__.py:539 - DEBUG - adminhost "node001" already exists
2012-10-09 08:44:02,003 PID: 41718 __init__.py:539 - DEBUG - submithost "node001" already exists
2012-10-09 08:44:02,005 PID: 41718 threadpool.py:135 - DEBUG - unfinished_tasks = 1
2012-10-09 08:44:02,208 PID: 41718 __init__.py:536 - DEBUG - command 'source /etc/profile && pkill -9 sge' failed with status 1
2012-10-09 08:44:02,493 PID: 41718 __init__.py:536 - DEBUG - command 'source /etc/profile && rm /etc/init.d/sge*' failed with status 1
2012-10-09 08:44:02,494 PID: 41718 __init__.py:539 - DEBUG - rm: cannot remove `/etc/init.d/sge*': No such file or directory
2012-10-09 08:44:02,873 PID: 41718 __init__.py:539 - DEBUG - linux-x64
2012-10-09 08:44:03,007 PID: 41718 threadpool.py:135 - DEBUG - unfinished_tasks = 1
2012-10-09 08:44:04,009 PID: 41718 threadpool.py:135 - DEBUG - unfinished_tasks = 1
2012-10-09 08:44:05,009 PID: 41718 threadpool.py:135 - DEBUG - unfinished_tasks = 1
2012-10-09 08:44:06,011 PID: 41718 threadpool.py:135 - DEBUG - unfinished_tasks = 1
2012-10-09 08:44:07,013 PID: 41718 threadpool.py:135 - DEBUG - unfinished_tasks = 1
2012-10-09 08:44:07,131 PID: 41718 __init__.py:539 - DEBUG - Reading configuration from file ./ec2_sge.conf
2012-10-09 08:44:07,132 PID: 41718 __init__.py:539 - DEBUG - [H[2JInstall log can be found in: /opt/sge6/default/common/install_logs/execd_install_node001_2012-10-09_15:44:03.log
2012-10-09 08:44:08,204 PID: 41718 sge.py:47 - INFO - Creating SGE parallel environment 'orte'
2012-10-09 08:44:08,205 PID: 41718 threadpool.py:135 - DEBUG - unfinished_tasks = 2
2012-10-09 08:44:08,389 PID: 41718 __init__.py:539 - DEBUG - 32
2012-10-09 08:44:08,477 PID: 41718 __init__.py:539 - DEBUG - 32
2012-10-09 08:44:09,627 PID: 41718 __init__.py:539 - DEBUG - root_at_master added "orte" to parallel environment list
2012-10-09 08:44:09,627 PID: 41718 sge.py:60 - INFO - Adding parallel environment 'orte' to queue 'all.q'
2012-10-09 08:44:09,919 PID: 41718 __init__.py:539 - DEBUG - Unable to find "orte" in "pe_list" of "all.q" - Adding new element.
2012-10-09 08:44:09,920 PID: 41718 __init__.py:539 - DEBUG -
2012-10-09 08:44:09,921 PID: 41718 __init__.py:539 - DEBUG - root_at_master modified "all.q" in cluster queue list
2012-10-09 08:44:09,921 PID: 41718 threadpool.py:123 - INFO - Shutting down threads...
2012-10-09 08:44:09,924 PID: 41718 threadpool.py:135 - DEBUG - unfinished_tasks = 4
2012-10-09 08:44:10,926 PID: 41718 utils.py:97 - INFO - Configuring cluster took 0.788 mins
2012-10-09 08:44:10,926 PID: 41718 utils.py:97 - INFO - Starting cluster took 2.912 mins
2012-10-09 08:44:10,927 PID: 41718 start.py:200 - INFO -
The cluster is now ready to use. To login to the master node as root, run:

    $ starcluster sshmaster issm

If you're having issues with the cluster you can reboot the instances and completely reconfigure the cluster from scratch using:

    $ starcluster restart issm

When you're finished using the cluster and wish to terminate it and stop paying for service:

    $ starcluster terminate issm

Alternatively, if the cluster uses EBS instances, you can use the 'stop' command to shutdown all nodes and put them into a 'stopped' state preserving the EBS volumes backing the nodes:

    $ starcluster stop issm

WARNING: Any data stored in ephemeral storage (usually /mnt) will be lost!

You can activate a 'stopped' cluster by passing the -x option to the 'start' command:

    $ starcluster start -x issm

This will start all 'stopped' nodes and reconfigure the cluster.





-----------------------------------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------------------------

I hope this helps point out to the problem. Reading the 3rd log, I'm not sure why the device /dev/xdvz
does not exist, and why the ebs volume is not being attached to.



Thanks again for any help on this.


Eric L.















From: JPL <larour_at_jpl.nasa.gov<mailto:larour_at_jpl.nasa.gov>>
Date: Mon, 8 Oct 2012 22:41:34 -0700
To: <starcluster_at_mit.edu<mailto:starcluster_at_mit.edu>>
Subject: EBS volume not mounting on restart

Dear folks,

I have the following problem while creating a cluster and mounting an ebs volume
on /data. Here is the config file part corresponding to my template:

[cluster issm]
# change this to the name of one of the keypair sections defined above
KEYNAME = ISSMStarCluster
# number of ec2 instances to launch
CLUSTER_SIZE = 2
# create the following user on the cluster
CLUSTER_USER = sgeadmin
# optionally specify shell (defaults to bash)
# (options: tcsh, zsh, csh, bash, ksh)
CLUSTER_SHELL = bash
# AMI to use for cluster nodes. These AMIs are for the us-east-1 region.
# Use the 'listpublic' command to list StarCluster AMIs in other regions
# The base i386 StarCluster AMI is ami-899d49e0
# The base x86_64 StarCluster AMI is ami-999d49f0
# The base HVM StarCluster AMI is ami-4583572c
NODE_IMAGE_ID = ami-4583572c
# instance type for all cluster nodes
# (options: cg1.4xlarge, c1.xlarge, m1.small, c1.medium, m2.xlarge, t1.micro, cc1.4xlarge, m1.medium, cc2.8xlarge, m1.large, m1.xlarge, hi1.4xlarge, m2.4xlarge, m2.2xlarge)
NODE_INSTANCE_TYPE = cc2.8xlarge
# Uncomment to disable installing/configuring a queueing system on the
# cluster (SGE)
#DISABLE_QUEUE=True
# Uncomment to specify a different instance type for the master node (OPTIONAL)
# (defaults to NODE_INSTANCE_TYPE if not specified)
#MASTER_INSTANCE_TYPE = m1.small
# Uncomment to specify a separate AMI to use for the master node. (OPTIONAL)
# (defaults to NODE_IMAGE_ID if not specified)
#MASTER_IMAGE_ID = ami-899d49e0 (OPTIONAL)
# availability zone to launch the cluster in (OPTIONAL)
# (automatically determined based on volumes (if any) or
# selected by Amazon if not specified)
#AVAILABILITY_ZONE = us-east-1c
# list of volumes to attach to the master node (OPTIONAL)
# these volumes, if any, will be NFS shared to the worker nodes
# see "Configuring EBS Volumes" below on how to define volume sections
VOLUMES = issm

# Sections starting with "volume" define your EBS volumes
[volume issm]
VOLUME_ID = vol-7d113b07
MOUNT_PATH = /data



when I first start this cluster:
starcluster start issm, everything works perfectly.

 start issm
StarCluster - (http://web.mit.edu/starcluster) (v. 0.9999)
Software Tools for Academics and Researchers (STAR)
Please submit bug reports to starcluster_at_mit.edu<mailto:starcluster_at_mit.edu>

>>> Using default cluster template: issm
>>> Validating cluster template settings...
>>> Cluster template settings are valid
>>> Starting cluster...
>>> Launching a 2-node cluster...
>>> Creating security group _at_sc-issm...
>>> Creating placement group _at_sc-issm...
Reservation:r-e3538485
>>> Waiting for cluster to come up... (updating every 10s)
>>> Waiting for all nodes to be in a 'running' state...
2/2 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>> Waiting for SSH to come up on all nodes...
2/2 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>> Waiting for cluster to come up took 2.281 mins
>>> The master node is ec2-107-22-25-149.compute-1.amazonaws.com
>>> Setting up the cluster...
>>> Attaching volume vol-7d113b07 to master node on /dev/sdz ...
>>> Configuring hostnames...
2/2 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>> Mounting EBS volume vol-7d113b07 on /data...
>>> Creating cluster user: None (uid: 1001, gid: 1001)
2/2 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>> Configuring scratch space for user(s): sgeadmin
0/2 | | 0%


2/2 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>> Configuring /etc/hosts on each node
2/2 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>> Starting NFS server on master
>>> Configuring NFS exports path(s):
/home /data
>>> Mounting all NFS export path(s) on 1 worker node(s)
1/1 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>> Setting up NFS took 0.152 mins
>>> Configuring passwordless ssh for root
>>> Configuring passwordless ssh for sgeadmin
>>> Shutting down threads...
20/20 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>> Configuring SGE...
>>> Configuring NFS exports path(s):
/opt/sge6
>>> Mounting all NFS export path(s) on 1 worker node(s)
1/1 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>> Setting up NFS took 0.102 mins
>>> Installing Sun Grid Engine...
1/1 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>> Creating SGE parallel environment 'orte'
2/2 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>> Adding parallel environment 'orte' to queue 'all.q'
>>> Shutting down threads...
20/20 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>> Configuring cluster took 1.506 mins
>>> Starting cluster took 3.877 mins

The cluster is now ready to use. To login to the master node
as root, run:

    $ starcluster sshmaster issm



I checked, /data is correctly mounted on my ebs volume, everything fine.
Here is an frisk dump:

root_at_master:/data# df
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sda1 8246240 5386292 2441056 69% /
udev 31263832 4 31263828 1% /dev
tmpfs 12507188 220 12506968 1% /run
none 5120 0 5120 0% /run/lock
none 31267964 0 31267964 0% /run/shm
/dev/xvdb 866917368 205028 822675452 1% /mnt
/dev/xvdz 103212320 192268 97777172 1% /data

the ebs volume I'm mounting is 100Gb in men, so everything checks out.


Now, if I stop the cluster, and start it again using the –x option, the cluster will boot
fine, but will not attach to the volume (won't attempt it at all) and will not even try
to mount /data. It's as though the [volumes] section of my config did not exist!


Here is the output of the starcluster start –x issm command:

st start -c issm -x issm
StarCluster - (http://web.mit.edu/starcluster) (v. 0.9999)
Software Tools for Academics and Researchers (STAR)
Please submit bug reports to starcluster_at_mit.edu<mailto:starcluster_at_mit.edu>

>>> Validating existing instances...
>>> Validating cluster template settings...
>>> Cluster template settings are valid
>>> Starting cluster...
>>> Starting stopped node: node001
>>> Waiting for cluster to come up... (updating every 10s)
>>> Waiting for all nodes to be in a 'running' state...
2/2 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>> Waiting for SSH to come up on all nodes...
2/2 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>> Waiting for cluster to come up took 1.780 mins
>>> The master node is ec2-23-22-242-221.compute-1.amazonaws.com
>>> Setting up the cluster...
>>> Configuring hostnames...
2/2 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>> Creating cluster user: None (uid: 1001, gid: 1001)
2/2 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>> Configuring scratch space for user(s): sgeadmin
2/2 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>> Configuring /etc/hosts on each node
2/2 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>> Starting NFS server on master
>>> Configuring NFS exports path(s):
/home
>>> Mounting all NFS export path(s) on 1 worker node(s)
1/1 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>> Setting up NFS took 0.106 mins
>>> Configuring passwordless ssh for root
>>> Configuring passwordless ssh for sgeadmin
>>> Shutting down threads...
20/20 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>> Configuring SGE...
>>> Configuring NFS exports path(s):
/opt/sge6
>>> Mounting all NFS export path(s) on 1 worker node(s)
1/1 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>> Setting up NFS took 0.065 mins
>>> Removing previous SGE installation...
>>> Installing Sun Grid Engine...
1/1 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>> Creating SGE parallel environment 'orte'
2/2 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>> Adding parallel environment 'orte' to queue 'all.q'
>>> Shutting down threads...
20/20 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>> Configuring cluster took 0.846 mins
>>> Starting cluster took 2.647 mins

The cluster is now ready to use. To login to the master node
as root, run:

    $ starcluster sshmaster issm




As you can see, no attempt was made at attaching to the ebs volume, and mounting of
/data was not attempted! When I log in, there is no ebs volume device for /data either




Any help or pointers would be appreciated!

Thanks in advance!

Eric L.

--------------------------------------------------------------------------
Dr. Eric Larour, Software Engineer III,
ISSM Task Manager (http://issm.jpl.nasa.gov<http://issm.jpl.nasa.gov/>)
Mechanical division, Propulsion Thermal and Materials Section, Applied Low Temperature Physics Group.
Jet Propulsion Laboratory.
MS 79-24, 4800 Oak Grove Drive, Pasadena CA 91109.
eric.larour_at_jpl.nasa.gov<mailto:eric.larour_at_jpl.nasa.gov>
http://issm.jpl.nasa.gov<http://issm.jpl.nasa.gov/>
Tel: 1 818 393 2435.
 --------------------------------------------------------------------------
Received on Tue Oct 09 2012 - 12:21:03 EDT
This archive was generated by hypermail 2.3.0.

Search:

Sort all by:

Date

Month

Thread

Author

Subject