StarCluster - Mailing List Archive

EBS volume not mounting on restart

From: 385J <Larour,>
Date: Tue, 9 Oct 2012 05:41:35 +0000

Dear folks,

I have the following problem while creating a cluster and mounting an ebs volume
on /data. Here is the config file part corresponding to my template:

[cluster issm]
# change this to the name of one of the keypair sections defined above
KEYNAME = ISSMStarCluster
# number of ec2 instances to launch
CLUSTER_SIZE = 2
# create the following user on the cluster
CLUSTER_USER = sgeadmin
# optionally specify shell (defaults to bash)
# (options: tcsh, zsh, csh, bash, ksh)
CLUSTER_SHELL = bash
# AMI to use for cluster nodes. These AMIs are for the us-east-1 region.
# Use the 'listpublic' command to list StarCluster AMIs in other regions
# The base i386 StarCluster AMI is ami-899d49e0
# The base x86_64 StarCluster AMI is ami-999d49f0
# The base HVM StarCluster AMI is ami-4583572c
NODE_IMAGE_ID = ami-4583572c
# instance type for all cluster nodes
# (options: cg1.4xlarge, c1.xlarge, m1.small, c1.medium, m2.xlarge, t1.micro, cc1.4xlarge, m1.medium, cc2.8xlarge, m1.large, m1.xlarge, hi1.4xlarge, m2.4xlarge, m2.2xlarge)
NODE_INSTANCE_TYPE = cc2.8xlarge
# Uncomment to disable installing/configuring a queueing system on the
# cluster (SGE)
#DISABLE_QUEUE=True
# Uncomment to specify a different instance type for the master node (OPTIONAL)
# (defaults to NODE_INSTANCE_TYPE if not specified)
#MASTER_INSTANCE_TYPE = m1.small
# Uncomment to specify a separate AMI to use for the master node. (OPTIONAL)
# (defaults to NODE_IMAGE_ID if not specified)
#MASTER_IMAGE_ID = ami-899d49e0 (OPTIONAL)
# availability zone to launch the cluster in (OPTIONAL)
# (automatically determined based on volumes (if any) or
# selected by Amazon if not specified)
#AVAILABILITY_ZONE = us-east-1c
# list of volumes to attach to the master node (OPTIONAL)
# these volumes, if any, will be NFS shared to the worker nodes
# see "Configuring EBS Volumes" below on how to define volume sections
VOLUMES = issm

# Sections starting with "volume" define your EBS volumes
[volume issm]
VOLUME_ID = vol-7d113b07
MOUNT_PATH = /data



when I first start this cluster:
starcluster start issm, everything works perfectly.

 start issm
StarCluster - (http://web.mit.edu/starcluster) (v. 0.9999)
Software Tools for Academics and Researchers (STAR)
Please submit bug reports to starcluster_at_mit.edu<mailto:starcluster_at_mit.edu>

>>> Using default cluster template: issm
>>> Validating cluster template settings...
>>> Cluster template settings are valid
>>> Starting cluster...
>>> Launching a 2-node cluster...
>>> Creating security group _at_sc-issm...
>>> Creating placement group _at_sc-issm...
Reservation:r-e3538485
>>> Waiting for cluster to come up... (updating every 10s)
>>> Waiting for all nodes to be in a 'running' state...
2/2 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>> Waiting for SSH to come up on all nodes...
2/2 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>> Waiting for cluster to come up took 2.281 mins
>>> The master node is ec2-107-22-25-149.compute-1.amazonaws.com
>>> Setting up the cluster...
>>> Attaching volume vol-7d113b07 to master node on /dev/sdz ...
>>> Configuring hostnames...
2/2 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>> Mounting EBS volume vol-7d113b07 on /data...
>>> Creating cluster user: None (uid: 1001, gid: 1001)
2/2 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>> Configuring scratch space for user(s): sgeadmin
0/2 | | 0%


2/2 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>> Configuring /etc/hosts on each node
2/2 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>> Starting NFS server on master
>>> Configuring NFS exports path(s):
/home /data
>>> Mounting all NFS export path(s) on 1 worker node(s)
1/1 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>> Setting up NFS took 0.152 mins
>>> Configuring passwordless ssh for root
>>> Configuring passwordless ssh for sgeadmin
>>> Shutting down threads...
20/20 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>> Configuring SGE...
>>> Configuring NFS exports path(s):
/opt/sge6
>>> Mounting all NFS export path(s) on 1 worker node(s)
1/1 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>> Setting up NFS took 0.102 mins
>>> Installing Sun Grid Engine...
1/1 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>> Creating SGE parallel environment 'orte'
2/2 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>> Adding parallel environment 'orte' to queue 'all.q'
>>> Shutting down threads...
20/20 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>> Configuring cluster took 1.506 mins
>>> Starting cluster took 3.877 mins

The cluster is now ready to use. To login to the master node
as root, run:

    $ starcluster sshmaster issm



I checked, /data is correctly mounted on my ebs volume, everything fine.
Here is an frisk dump:

root_at_master:/data# df
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sda1 8246240 5386292 2441056 69% /
udev 31263832 4 31263828 1% /dev
tmpfs 12507188 220 12506968 1% /run
none 5120 0 5120 0% /run/lock
none 31267964 0 31267964 0% /run/shm
/dev/xvdb 866917368 205028 822675452 1% /mnt
/dev/xvdz 103212320 192268 97777172 1% /data

the ebs volume I'm mounting is 100Gb in men, so everything checks out.


Now, if I stop the cluster, and start it again using the x option, the cluster will boot
fine, but will not attach to the volume (won't attempt it at all) and will not even try
to mount /data. It's as though the [volumes] section of my config did not exist!


Here is the output of the starcluster start x issm command:

st start -c issm -x issm
StarCluster - (http://web.mit.edu/starcluster) (v. 0.9999)
Software Tools for Academics and Researchers (STAR)
Please submit bug reports to starcluster_at_mit.edu<mailto:starcluster_at_mit.edu>

>>> Validating existing instances...
>>> Validating cluster template settings...
>>> Cluster template settings are valid
>>> Starting cluster...
>>> Starting stopped node: node001
>>> Waiting for cluster to come up... (updating every 10s)
>>> Waiting for all nodes to be in a 'running' state...
2/2 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>> Waiting for SSH to come up on all nodes...
2/2 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>> Waiting for cluster to come up took 1.780 mins
>>> The master node is ec2-23-22-242-221.compute-1.amazonaws.com
>>> Setting up the cluster...
>>> Configuring hostnames...
2/2 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>> Creating cluster user: None (uid: 1001, gid: 1001)
2/2 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>> Configuring scratch space for user(s): sgeadmin
2/2 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>> Configuring /etc/hosts on each node
2/2 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>> Starting NFS server on master
>>> Configuring NFS exports path(s):
/home
>>> Mounting all NFS export path(s) on 1 worker node(s)
1/1 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>> Setting up NFS took 0.106 mins
>>> Configuring passwordless ssh for root
>>> Configuring passwordless ssh for sgeadmin
>>> Shutting down threads...
20/20 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>> Configuring SGE...
>>> Configuring NFS exports path(s):
/opt/sge6
>>> Mounting all NFS export path(s) on 1 worker node(s)
1/1 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>> Setting up NFS took 0.065 mins
>>> Removing previous SGE installation...
>>> Installing Sun Grid Engine...
1/1 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>> Creating SGE parallel environment 'orte'
2/2 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>> Adding parallel environment 'orte' to queue 'all.q'
>>> Shutting down threads...
20/20 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>> Configuring cluster took 0.846 mins
>>> Starting cluster took 2.647 mins

The cluster is now ready to use. To login to the master node
as root, run:

    $ starcluster sshmaster issm




As you can see, no attempt was made at attaching to the ebs volume, and mounting of
/data was not attempted! When I log in, there is no ebs volume device for /data either




Any help or pointers would be appreciated!

Thanks in advance!

Eric L.

--------------------------------------------------------------------------
Dr. Eric Larour, Software Engineer III,
ISSM Task Manager (http://issm.jpl.nasa.gov<http://issm.jpl.nasa.gov/>)
Mechanical division, Propulsion Thermal and Materials Section, Applied Low Temperature Physics Group.
Jet Propulsion Laboratory.
MS 79-24, 4800 Oak Grove Drive, Pasadena CA 91109.
eric.larour_at_jpl.nasa.gov<mailto:eric.larour_at_jpl.nasa.gov>
http://issm.jpl.nasa.gov<http://issm.jpl.nasa.gov/>
Tel: 1 818 393 2435.
 --------------------------------------------------------------------------
Received on Tue Oct 09 2012 - 01:41:39 EDT
This archive was generated by hypermail 2.3.0.

Search:

Sort all by:

Date

Month

Thread

Author

Subject