StarCluster - Mailing List Archive

Re: createvolume works / mount fails

From: Lyn Gerner <no email>
Date: Wed, 13 Feb 2013 16:09:52 -0800

In case this is of use to anyone else, here's what I had to do to get an
"extra" (not /home, not /opt) volume to mount:

Step 1: used "starcluster createvolume" to create a 5GB volume.

*Step 2: had to make sure that the availability zone of the cluster and the
volume match. Here vol-f1a0d380 is
in us-east-1d (a us-east-1c volume can't be mounted on a us-east-1d
cluster):

[cluster jobscluster-e1d]
# Declares that this cluster uses smallcluster as defaults
EXTENDS=smallcluster
AVAILABILITY_ZONE = us-east-1d
VOLUMES = jobspoolse1d

[volume jobspoolse1d]
VOLUME_ID = vol-f1a0d380
MOUNT_PATH = /usr/share/jobs/

With that setup, and the volume located in us-east-1d, the volume still
didn't mount when
starcluster attempted to mount it as device /dev/sdz (first device name
that starcluster
defaults to).

Used the AWS management console to perform the volume attachment to the
master, and
found it interesting that the range of devices offered by this interface
was limited
to /dev/sd[f-p] -- so /dev/sdz is invalid according to the console's
process.

Since I was seeing the device being set in the relevant starcluster code, I
gave this a try:
Step 3: set the device explicitly within the starcluster volume definition,
within the range of sd[f-p]:

[volume jobspoolse1d]
VOLUME_ID = vol-f1a0d380
MOUNT_PATH = /usr/share/jobs/
DEVICE = /dev/sdf

This combination of settings allowed the device to be properly mounted on
the master, and successfully exported to and mounted on the compute hosts.

Fyi,
Lyn

On Tue, Feb 12, 2013 at 7:44 PM, Lyn Gerner <schedulerqueen_at_gmail.com>wrote:

> Read the code in clustersetup.py and have retried this process w/no tags
> or any non-essential data associated w/the latest created volume,
> vol-52fa8f23. Same failure mode as before:
>
> .starcluster mary$ sc start -b 0.25 -i m1.small -I m1.small -c jobscluster
> jobscluster
> StarCluster - (http://web.mit.edu/starcluster) (v. 0.93.3)
> Software Tools for Academics and Researchers (STAR)
> Please submit bug reports to starcluster_at_mit.edu
>
> *** WARNING - ************************************************************
> *** WARNING - SPOT INSTANCES ARE NOT GUARANTEED TO COME UP
> *** WARNING -
> *** WARNING - Spot instances can take a long time to come up and may not
> *** WARNING - come up at all depending on the current AWS load and your
> *** WARNING - max spot bid price.
> *** WARNING -
> *** WARNING - StarCluster will wait indefinitely until all instances (2)
> *** WARNING - come up. If this takes too long, you can cancel the start
> *** WARNING - command using CTRL-C. You can then resume the start command
> *** WARNING - later on using the --no-create (-x) option:
> *** WARNING -
> *** WARNING - $ starcluster start -x jobscluster
> *** WARNING -
> *** WARNING - This will use the existing spot instances launched
> *** WARNING - previously and continue starting the cluster. If you don't
> *** WARNING - wish to wait on the cluster any longer after pressing CTRL-C
> *** WARNING - simply terminate the cluster using the 'terminate' command.
> *** WARNING - ************************************************************
>
> *** WARNING - Waiting 5 seconds before continuing...
> *** WARNING - Press CTRL-C to cancel...
> 5...4...3...2...1...
> >>> Validating cluster template settings...
> >>> Cluster template settings are valid
> >>> Starting cluster...
> >>> Launching a 2-node cluster...
> >>> Launching master node (ami: ami-4b9f0a22, type: m1.small)...
> >>> Creating security group _at_sc-jobscluster...
> Reservation:r-22c1d659
> >>> Launching node001 (ami: ami-4b9f0a22, type: m1.small)
> SpotInstanceRequest:sir-654c2614
> >>> Waiting for cluster to come up... (updating every 30s)
> >>> Waiting for open spot requests to become active...
> 1/1 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
> 100%
> >>> Waiting for all nodes to be in a 'running' state...
> 2/2 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
> 100%
> >>> Waiting for SSH to come up on all nodes...
> 2/2 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
> 100%
> >>> Waiting for cluster to come up took 5.990 mins
> >>> The master node is ec2-50-16-56-237.compute-1.amazonaws.com
> >>> Setting up the cluster...
> >>> Attaching volume vol-52fa8f23 to master node on /dev/sdz ...
> >>> Configuring hostnames...
> 2/2 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
> 100%
> *** WARNING - Cannot find device /dev/xvdz for volume vol-52fa8f23
> *** WARNING - Not mounting vol-52fa8f23 on /usr/share/jobs/
> *** WARNING - This usually means there was a problem attaching the EBS
> volume to the master node
> <snip>
>
> However, starcluster listclusters shows the volume attached to the master:
>
> starcluster mary$ sc listclusters
> StarCluster - (http://web.mit.edu/starcluster) (v. 0.93.3)
> Software Tools for Academics and Researchers (STAR)
> Please submit bug reports to starcluster_at_mit.edu
>
> ---------------------------------------------
> jobscluster (security group: _at_sc-jobscluster)
> ---------------------------------------------
> Launch time: 2013-02-12 18:51:26
> Uptime: 0 days, 00:36:27
> Zone: us-east-1c
> Keypair: lapuserkey
> EBS volumes:
> vol-52fa8f23 on master:/dev/sdz (status: attached)
> vol-e6e39697 on master:/dev/sda (status: attached)
> vol-bce99ccd on node001:/dev/sda (status: attached)
> Spot requests: 1 active
> Cluster nodes:
> master running i-859591f5 ec2-50-16-56-237.compute-1.amazonaws.com
> node001 running i-679d9917 ec2-54-234-176-219.compute-1.amazonaws.com(spot sir-654c2614)
> Total nodes: 2
>
> ...but on the master itself, neither /dev/sdz nor /dev/xdvz shows up:
>
> [root_at_master ~]# ls /dev/sd*
> /dev/sda /dev/sda1 /dev/sda2 /dev/sda3 /dev/sdad /dev/sdb
>
> [root_at_master ~]# ls /dev/xvd*
> /dev/xvdad /dev/xvde /dev/xvde1 /dev/xvde2 /dev/xvde3 /dev/xvdf
>
> Thanks again for any suggestions on how to get this volume to successfully
> mount on the master.
>
> Lyn
>
>
> On Tue, Feb 12, 2013 at 1:50 PM, Lyn Gerner <schedulerqueen_at_gmail.com>wrote:
>
>> Hi All,
>>
>> I've been receiving an error, consistently, from multiple attempts to
>> boot a cluster that references an EBS volume that I've created
>> w/"starcluster createvolume":
>>
>> Here is the output from the most recent createvolume; looks like
>> everything goes fine:
>>
>> .starcluster mary$ alias sc=starcluster
>> .starcluster mary$ sc createvolume --name=usrsharejobs-cv5g-use1c 5
>> us-east-1c
>> StarCluster - (http://web.mit.edu/starcluster) (v. 0.93.3)
>> Software Tools for Academics and Researchers (STAR)
>> Please submit bug reports to starcluster_at_mit.edu
>>
>> >>> No keypair specified, picking one from config...
>> >>> Using keypair: lapuserkey
>> >>> Creating security group _at_sc-volumecreator...
>> >>> No instance in group _at_sc-volumecreator for zone us-east-1c, launching
>> one now.
>> Reservation:r-de9f8aa5
>> >>> Waiting for volume host to come up... (updating every 30s)
>> >>> Waiting for all nodes to be in a 'running' state...
>> 1/1 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
>> 100%
>> >>> Waiting for SSH to come up on all nodes...
>> 1/1 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
>> 100%
>> >>> Waiting for cluster to come up took 1.447 mins
>> >>> Checking for required remote commands...
>> >>> Creating 5GB volume in zone us-east-1c
>> >>> New volume id: vol-53600b22
>> >>> Waiting for new volume to become 'available'...
>> >>> Attaching volume vol-53600b22 to instance i-6b714b1b...
>> >>> Formatting volume...
>> Filesystem label=
>> OS type: Linux
>> Block size=4096 (log=2)
>> Fragment size=4096 (log=2)
>> Stride=0 blocks, Stripe width=0 blocks
>> 327680 inodes, 1310720 blocks
>> 65536 blocks (5.00%) reserved for the super user
>> First data block=0
>> Maximum filesystem blocks=1342177280
>> 40 block groups
>> 32768 blocks per group, 32768 fragments per group
>> 8192 inodes per group
>> Superblock backups stored on blocks:
>> 32768, 98304, 163840, 229376, 294912, 819200, 884736
>>
>> Writing inode tables: done
>> Creating journal (32768 blocks): done
>> Writing superblocks and filesystem accounting information: done
>>
>> This filesystem will be automatically checked every 33 mounts or
>> 180 days, whichever comes first. Use tune2fs -c or -i to override.
>> mke2fs 1.41.14 (22-Dec-2010)
>>
>> >>> Leaving volume vol-53600b22 attached to instance i-6b714b1b
>> >>> Not terminating host instance i-6b714b1b
>> *** WARNING - There are still volume hosts running: i-6b714b1b
>> *** WARNING - Run 'starcluster terminate volumecreator' to terminate
>> *all* volume host instances once they're no longer needed
>> >>> Your new 5GB volume vol-53600b22 has been created successfully
>> >>> Creating volume took 1.871 mins
>>
>> .starcluster mary$ sc terminate volumecreator
>> StarCluster - (http://web.mit.edu/starcluster) (v. 0.93.3)
>> Software Tools for Academics and Researchers (STAR)
>> Please submit bug reports to starcluster_at_mit.edu
>>
>> Terminate EBS cluster volumecreator (y/n)? y
>> >>> Detaching volume vol-53600b22 from volhost-us-east-1c
>> >>> Terminating node: volhost-us-east-1c (i-6b714b1b)
>> >>> Waiting for cluster to terminate...
>> >>> Removing _at_sc-volumecreator security group
>>
>> .starcluster mary$ sc listvolumes
>> <snip>
>>
>> volume_id: vol-53600b22
>> size: 5GB
>> status: available
>> availability_zone: us-east-1c
>> create_time: 2013-02-12 13:12:16
>> tags: Name=usrsharejobs-cv5g-use1c
>>
>> <snip>
>>
>> So here is the subsequent attempt to boot a cluster that tries to mount
>> the new EBS volume:
>>
>> .starcluster mary$ sc start -b 0.25 -i m1.small -I m1.small -c
>> jobscluster jobscluster
>> StarCluster - (http://web.mit.edu/starcluster) (v. 0.93.3)
>> Software Tools for Academics and Researchers (STAR)
>> Please submit bug reports to starcluster_at_mit.edu
>>
>> *** WARNING - ************************************************************
>> *** WARNING - SPOT INSTANCES ARE NOT GUARANTEED TO COME UP
>> *** WARNING -
>> *** WARNING - Spot instances can take a long time to come up and may not
>> *** WARNING - come up at all depending on the current AWS load and your
>> *** WARNING - max spot bid price.
>> *** WARNING -
>> *** WARNING - StarCluster will wait indefinitely until all instances (2)
>> *** WARNING - come up. If this takes too long, you can cancel the start
>> *** WARNING - command using CTRL-C. You can then resume the start command
>> *** WARNING - later on using the --no-create (-x) option:
>> *** WARNING -
>> *** WARNING - $ starcluster start -x jobscluster
>> *** WARNING -
>> *** WARNING - This will use the existing spot instances launched
>> *** WARNING - previously and continue starting the cluster. If you don't
>> *** WARNING - wish to wait on the cluster any longer after pressing CTRL-C
>> *** WARNING - simply terminate the cluster using the 'terminate' command.
>> *** WARNING - ************************************************************
>>
>> *** WARNING - Waiting 5 seconds before continuing...
>> *** WARNING - Press CTRL-C to cancel...
>> 5...4...3...2...1...
>> >>> Validating cluster template settings...
>> >>> Cluster template settings are valid
>> >>> Starting cluster...
>> >>> Launching a 2-node cluster...
>> >>> Launching master node (ami: ami-4b9f0a22, type: m1.small)...
>> >>> Creating security group _at_sc-jobscluster...
>> Reservation:r-ba8c99c1
>> >>> Launching node001 (ami: ami-4b9f0a22, type: m1.small)
>> SpotInstanceRequest:sir-a05ae014
>> >>> Waiting for cluster to come up... (updating every 30s)
>> >>> Waiting for open spot requests to become active...
>> 1/1 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
>> 100%
>> >>> Waiting for all nodes to be in a 'running' state...
>> 2/2 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
>> 100%
>> >>> Waiting for SSH to come up on all nodes...
>> 2/2 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
>> 100%
>> >>> Waiting for cluster to come up took 6.245 mins
>> >>> The master node is ec2-54-242-244-139.compute-1.amazonaws.com
>> >>> Setting up the cluster...
>> >>> Attaching volume vol-53600b22 to master node on /dev/sdz ...
>> >>> Configuring hostnames...
>> 2/2 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
>> 100%
>> *** WARNING - Cannot find device /dev/xvdz for volume vol-53600b22
>> *** WARNING - Not mounting vol-53600b22 on /usr/share/jobs
>> *** WARNING - This usually means there was a problem attaching the EBS
>> volume to the master node
>> <snip>
>>
>> So per the relevant, past email threads, I'm using the createvolume
>> command, and it still gives this error. Also tried creating the volume
>> thru the AWS console; subsequent cluster boot fails at the same point w/the
>> same problem of not finding the device.
>>
>> I'll appreciate any suggestions.
>>
>> Thanks much,
>> Lyn
>>
>>
>>
>
Received on Wed Feb 13 2013 - 19:09:53 EST
This archive was generated by hypermail 2.3.0.

Search:

Sort all by:

Date

Month

Thread

Author

Subject