StarCluster - Mailing List Archive

EBS volume not mounting on restart

From: 385J <Larour,>
Date: Tue, 9 Oct 2012 05:41:35 +0000

Dear folks,

I have the following problem while creating a cluster and mounting an ebs volume
on /data. Here is the config file part corresponding to my template:

[cluster issm]
# change this to the name of one of the keypair sections defined above
KEYNAME = ISSMStarCluster
# number of ec2 instances to launch
CLUSTER_SIZE = 2
# create the following user on the cluster
CLUSTER_USER = sgeadmin
# optionally specify shell (defaults to bash)
# (options: tcsh, zsh, csh, bash, ksh)
CLUSTER_SHELL = bash
# AMI to use for cluster nodes. These AMIs are for the us-east-1 region.
# Use the 'listpublic' command to list StarCluster AMIs in other regions
# The base i386 StarCluster AMI is ami-899d49e0
# The base x86_64 StarCluster AMI is ami-999d49f0
# The base HVM StarCluster AMI is ami-4583572c
NODE_IMAGE_ID = ami-4583572c
# instance type for all cluster nodes
# (options: cg1.4xlarge, c1.xlarge, m1.small, c1.medium, m2.xlarge, t1.micro, cc1.4xlarge, m1.medium, cc2.8xlarge, m1.large, m1.xlarge, hi1.4xlarge, m2.4xlarge, m2.2xlarge)
NODE_INSTANCE_TYPE = cc2.8xlarge
# Uncomment to disable installing/configuring a queueing system on the
# cluster (SGE)
#DISABLE_QUEUE=True
# Uncomment to specify a different instance type for the master node (OPTIONAL)
# (defaults to NODE_INSTANCE_TYPE if not specified)
#MASTER_INSTANCE_TYPE = m1.small
# Uncomment to specify a separate AMI to use for the master node. (OPTIONAL)
# (defaults to NODE_IMAGE_ID if not specified)
#MASTER_IMAGE_ID = ami-899d49e0 (OPTIONAL)
# availability zone to launch the cluster in (OPTIONAL)
# (automatically determined based on volumes (if any) or
# selected by Amazon if not specified)
#AVAILABILITY_ZONE = us-east-1c
# list of volumes to attach to the master node (OPTIONAL)
# these volumes, if any, will be NFS shared to the worker nodes
# see "Configuring EBS Volumes" below on how to define volume sections
VOLUMES = issm

# Sections starting with "volume" define your EBS volumes
[volume issm]
VOLUME_ID = vol-7d113b07
MOUNT_PATH = /data



when I first start this cluster:
starcluster start issm, everything works perfectly.

 start issm
StarCluster - (http://web.mit.edu/starcluster) (v. 0.9999)
Software Tools for Academics and Researchers (STAR)
Please submit bug reports to starcluster_at_mit.edu<mailto:starcluster_at_mit.edu>

>>> Using default cluster template: issm
>>> Validating cluster template settings...
>>> Cluster template settings are valid
>>> Starting cluster...
>>> Launching a 2-node cluster...
>>> Creating security group _at_sc-issm...
>>> Creating placement group _at_sc-issm...
Reservation:r-e3538485
>>> Waiting for cluster to come up... (updating every 10s)
>>> Waiting for all nodes to be in a 'running' state...
2/2 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>> Waiting for SSH to come up on all nodes...
2/2 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>> Waiting for cluster to come up took 2.281 mins
>>> The master node is ec2-107-22-25-149.compute-1.amazonaws.com
>>> Setting up the cluster...
>>> Attaching volume vol-7d113b07 to master node on /dev/sdz ...
>>> Configuring hostnames...
2/2 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>> Mounting EBS volume vol-7d113b07 on /data...
>>> Creating cluster user: None (uid: 1001, gid: 1001)
2/2 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>> Configuring scratch space for user(s): sgeadmin
0/2 | | 0%


2/2 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>> Configuring /etc/hosts on each node
2/2 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>> Starting NFS server on master
>>> Configuring NFS exports path(s):
/home /data
>>> Mounting all NFS export path(s) on 1 worker node(s)
1/1 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>> Setting up NFS took 0.152 mins
>>> Configuring passwordless ssh for root
>>> Configuring passwordless ssh for sgeadmin
>>> Shutting down threads...
20/20 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>> Configuring SGE...
>>> Configuring NFS exports path(s):
/opt/sge6
>>> Mounting all NFS export path(s) on 1 worker node(s)
1/1 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>> Setting up NFS took 0.102 mins
>>> Installing Sun Grid Engine...
1/1 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>> Creating SGE parallel environment 'orte'
2/2 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>> Adding parallel environment 'orte' to queue 'all.q'
>>> Shutting down threads...
20/20 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>> Configuring cluster took 1.506 mins
>>> Starting cluster took 3.877 mins

The cluster is now ready to use. To login to the master node
as root, run:

    $ starcluster sshmaster issm



I checked, /data is correctly mounted on my ebs volume, everything fine.
Here is an frisk dump:

root_at_master:/data# df
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sda1 8246240 5386292 2441056 69% /
udev 31263832 4 31263828 1% /dev
tmpfs 12507188 220 12506968 1% /run
none 5120 0 5120 0% /run/lock
none 31267964 0 31267964 0% /run/shm
/dev/xvdb 866917368 205028 822675452 1% /mnt
/dev/xvdz 103212320 192268 97777172 1% /data

the ebs volume I'm mounting is 100Gb in men, so everything checks out.


Now, if I stop the cluster, and start it again using the –x option, the cluster will boot
fine, but will not attach to the volume (won't attempt it at all) and will not even try
to mount /data. It's as though the [volumes] section of my config did not exist!


Here is the output of the starcluster start –x issm command:

st start -c issm -x issm
StarCluster - (http://web.mit.edu/starcluster) (v. 0.9999)
Software Tools for Academics and Researchers (STAR)
Please submit bug reports to starcluster_at_mit.edu<mailto:starcluster_at_mit.edu>

>>> Validating existing instances...
>>> Validating cluster template settings...
>>> Cluster template settings are valid
>>> Starting cluster...
>>> Starting stopped node: node001
>>> Waiting for cluster to come up... (updating every 10s)
>>> Waiting for all nodes to be in a 'running' state...
2/2 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>> Waiting for SSH to come up on all nodes...
2/2 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>> Waiting for cluster to come up took 1.780 mins
>>> The master node is ec2-23-22-242-221.compute-1.amazonaws.com
>>> Setting up the cluster...
>>> Configuring hostnames...
2/2 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>> Creating cluster user: None (uid: 1001, gid: 1001)
2/2 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>> Configuring scratch space for user(s): sgeadmin
2/2 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>> Configuring /etc/hosts on each node
2/2 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>> Starting NFS server on master
>>> Configuring NFS exports path(s):
/home
>>> Mounting all NFS export path(s) on 1 worker node(s)
1/1 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>> Setting up NFS took 0.106 mins
>>> Configuring passwordless ssh for root
>>> Configuring passwordless ssh for sgeadmin
>>> Shutting down threads...
20/20 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>> Configuring SGE...
>>> Configuring NFS exports path(s):
/opt/sge6
>>> Mounting all NFS export path(s) on 1 worker node(s)
1/1 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>> Setting up NFS took 0.065 mins
>>> Removing previous SGE installation...
>>> Installing Sun Grid Engine...
1/1 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>> Creating SGE parallel environment 'orte'
2/2 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>> Adding parallel environment 'orte' to queue 'all.q'
>>> Shutting down threads...
20/20 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>> Configuring cluster took 0.846 mins
>>> Starting cluster took 2.647 mins

The cluster is now ready to use. To login to the master node
as root, run:

    $ starcluster sshmaster issm




As you can see, no attempt was made at attaching to the ebs volume, and mounting of
/data was not attempted! When I log in, there is no ebs volume device for /data either




Any help or pointers would be appreciated!

Thanks in advance!

Eric L.

--------------------------------------------------------------------------
Dr. Eric Larour, Software Engineer III,
ISSM Task Manager (http://issm.jpl.nasa.gov<http://issm.jpl.nasa.gov/>)
Mechanical division, Propulsion Thermal and Materials Section, Applied Low Temperature Physics Group.
Jet Propulsion Laboratory.
MS 79-24, 4800 Oak Grove Drive, Pasadena CA 91109.
eric.larour_at_jpl.nasa.gov<mailto:eric.larour_at_jpl.nasa.gov>
http://issm.jpl.nasa.gov<http://issm.jpl.nasa.gov/>
Tel: 1 818 393 2435.
 --------------------------------------------------------------------------
Received on Tue Oct 09 2012 - 01:41:39 EDT
This archive was generated by hypermail 2.3.0.

Search:

Sort all by:

Date

Month

Thread

Author

Subject