I have had limited success getting Starcluster to successfully launch a
cluster with EC2-VPC nodes under the development version (0.9999). Using a
certain AMI I can easily launch a Starcluster cluster with EC2-VPC nodes,
but using a different AMI it fails to launch. I do set the config
variables "VPC_ID" and "SUBNET_ID" and the only difference between the two
cluster templates is the AMI that is used.
Both AMIs used successfully launch a Starcluster cluster with EC2-classic
nodes. The only noted difference between the AMIs is that the one that
successfully launches a Starcluster cluster with VPC-EC2 nodes is a private
AMI that is "shared" with the account that I am running my VPC within. The
AMI that doesn't work with Starcluster-VPC is one that is private AMI
"owned" by the account I am running my VPC within.
I believe the error I am getting has something to do with the Tags,
specifically the "_at_sc-core" tag's value being beyond 255 characters, but I
could be wrong. Below I have included an example of the successful launch,
the failed launch (including error message), and the listed clusters after
both commands.
Any suggestions on how to address this issue would be greatly appreciated.
Thanks in advance for the help,
-Jennifer
-------------------------------------------------------------------------------------------------
------ Below is what it looks like when I have a successful launch ---
-------------------------------------------------------------------------------------------------
(starcluster)root_at_xxxxxxxxxxx:~# starcluster start -c testvpcA vpcA
StarCluster - (
http://star.mit.edu/cluster) (v. 0.9999)
Software Tools for Academics and Researchers (STAR)
Please submit bug reports to starcluster_at_mit.edu
>>> Validating cluster template settings...
>>> Cluster template settings are valid
>>> Starting cluster...
>>> Launching a 1-node cluster...
>>> Creating security group _at_sc-vpcA...
Reservation:r-2843fa4e
>>> Waiting for cluster to come up... (updating every 30s)
>>> Waiting for all nodes to be in a 'running' state...
1/1 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
100%
>>> Waiting for SSH to come up on all nodes...
1/1 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
100%
>>> Waiting for cluster to come up took 1.574 mins
>>> The master node is
>>> Configuring cluster...
>>> Running plugin starcluster.clustersetup.DefaultClusterSetup
>>> Configuring hostnames...
1/1 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
100%
>>> Creating cluster user: sgeadmin (uid: 1007, gid: 1000)
1/1 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
100%
>>> Configuring scratch space for user(s): sgeadmin
1/1 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
100%
>>> Configuring /etc/hosts on each node
1/1 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
100%
>>> Starting NFS server on master
>>> Setting up NFS took 0.113 mins
>>> Configuring passwordless ssh for root
>>> Configuring passwordless ssh for sgeadmin
>>> Running plugin starcluster.plugins.sge.SGEPlugin
>>> Configuring SGE...
>>> Setting up NFS took 0.000 mins
>>> Removing previous SGE installation...
>>> Installing Sun Grid Engine...
>>> Creating SGE parallel environment 'orte'
1/1 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
100%
>>> Adding parallel environment 'orte' to queue 'all.q'
>>> Configuring cluster took 0.679 mins
>>> Starting cluster took 2.307 mins
The cluster is now ready to use. To login to the master node
as root, run:
$ starcluster sshmaster vpcA
If you're having issues with the cluster you can reboot the
instances and completely reconfigure the cluster from
scratch using:
$ starcluster restart vpcA
When you're finished using the cluster and wish to terminate
it and stop paying for service:
$ starcluster terminate vpcA
Alternatively, if the cluster uses EBS instances, you can
use the 'stop' command to shutdown all nodes and put them
into a 'stopped' state preserving the EBS volumes backing
the nodes:
$ starcluster stop vpcA
WARNING: Any data stored in ephemeral storage (usually /mnt)
will be lost!
You can activate a 'stopped' cluster by passing the -x
option to the 'start' command:
$ starcluster start -x vpcA
This will start all 'stopped' nodes and reconfigure the
cluster.
-------------------------------------------------------------------------------------------------
------ Below is what it looks like when I have a FAILED launch ---
-------------------------------------------------------------------------------------------------
(starcluster)root_at_xxxxxxxxxxx:~# starcluster start -c testvpcB vpcB
StarCluster - (
http://star.mit.edu/cluster) (v. 0.9999)
Software Tools for Academics and Researchers (STAR)
Please submit bug reports to starcluster_at_mit.edu
>>> Validating cluster template settings...
>>> Cluster template settings are valid
>>> Starting cluster...
>>> Launching a 1-node cluster...
>>> Creating security group _at_sc-vpcB...
!!! ERROR - InvalidParameterValue: Tag value exceeds the maximum length of
255 characters
Traceback (most recent call last):
File "/root/.virtualenvs/starcluster/starcluster/starcluster/cli.py",
line 274, in main
sc.execute(args)
File
"/root/.virtualenvs/starcluster/starcluster/starcluster/commands/start.py",
line 220, in execute
validate_running=validate_running)
File "/root/.virtualenvs/starcluster/starcluster/starcluster/cluster.py",
line 1537, in start
return self._start(create=create, create_only=create_only)
File "<string>", line 2, in _start
File "/root/.virtualenvs/starcluster/starcluster/starcluster/utils.py",
line 111, in wrap_f
res = func(*arg, **kargs)
File "/root/.virtualenvs/starcluster/starcluster/starcluster/cluster.py",
line 1552, in _start
self.create_cluster()
File "/root/.virtualenvs/starcluster/starcluster/starcluster/cluster.py",
line 1066, in create_cluster
self._create_flat_rate_cluster()
File "/root/.virtualenvs/starcluster/starcluster/starcluster/cluster.py",
line 1091, in _create_flat_rate_cluster
force_flat=True)[0]
File "/root/.virtualenvs/starcluster/starcluster/starcluster/cluster.py",
line 859, in create_nodes
cluster_sg = self.cluster_group.name
File "/root/.virtualenvs/starcluster/starcluster/starcluster/cluster.py",
line 657, in cluster_group
self._add_tags_to_sg(sg)
File "/root/.virtualenvs/starcluster/starcluster/starcluster/cluster.py",
line 698, in _add_tags_to_sg
sg.add_tag(static.CORE_TAG, core_settings)
File
"/root/.virtualenvs/starcluster/local/lib/python2.7/site-packages/boto-2.19.0-py2.7.egg/boto/ec2/ec2object.py",
line 82, in add_tag
dry_run=dry_run
File
"/root/.virtualenvs/starcluster/local/lib/python2.7/site-packages/boto-2.19.0-py2.7.egg/boto/ec2/connection.py",
line 4026, in create_tags
return self.get_status('CreateTags', params, verb='POST')
File
"/root/.virtualenvs/starcluster/local/lib/python2.7/site-packages/boto-2.19.0-py2.7.egg/boto/connection.py",
line 1158, in get_status
raise self.ResponseError(response.status, response.reason, body)
EC2ResponseError: EC2ResponseError: 400 Bad Request
<?xml version="1.0" encoding="UTF-8"?>
<Response><Errors><Error><Code>InvalidParameterValue</Code><Message>Tag
value exceeds the maximum length of 255
characters</Message></Error></Errors><RequestID>1f589605-8f30-472d-8989-22ea120aea14</RequestID></Response>
-----------------------------------------------------------------------------------------------------------------
------ When if FAILS it creates only a security group see "listclusters"
below ---
-----------------------------------------------------------------------------------------------------------------
(starcluster)root_at_xxxxxxxxxxx:~# starcluster listclusters
StarCluster - (
http://star.mit.edu/cluster) (v. 0.9999)
Software Tools for Academics and Researchers (STAR)
Please submit bug reports to starcluster_at_mit.edu
-------------------------------
vpcB (security group: _at_sc-vpcB)
-------------------------------
Launch time: N/A
Uptime: N/A
Zone: N/A
Keypair: N/A
EBS volumes: N/A
Cluster nodes: N/A
-------------------------------
vpcA (security group: _at_sc-vpcA)
-------------------------------
Launch time: 2013-12-10 14:39:36
Uptime: 0 days, 00:04:23
Zone: us-east-1b
Keypair: Starcluster_VPC
EBS volumes: N/A
Cluster nodes:
master running i-1d745b65 10.0.0.138
Total nodes: 1
(starcluster)root_at_xxxxxxxxxxx:~#
Received on Tue Dec 10 2013 - 11:23:23 EST