We are trying to run the loadbalancing when launching a cluster of QIIME
AMI's (a software for analysis of next-gen sequencing data) and are running
into some errors.
The loadbalancing works well when running the StarCluster AMI but we have
not been able to get QIIME (easily) installed on that AMI.
Below is the error. Any info would be great.
Thanks.
*
*
*StarCluster Configuration
*####################################
## StarCluster Configuration File ##
####################################
[global]
DEFAULT_TEMPLATE=qiime
ENABLE_EXPERIMENTAL=True
###########################
## Defining Cluster ##
###########################
[cluster QIIMETest]
# change this to the name of one of the keypair sections defined above
KEYNAME = StarCluster
# number of ec2 instances to launch
CLUSTER_SIZE = 2
# create the following user on the cluster
NODE_IMAGE_ID = ami-d5cc8fbc #FDA QIIME 11.10 image
# instance type for all cluster nodes
# (options: m1.medium, m3.2xlarge, cc2.8xlarge, m1.large, c1.xlarge,
hs1.8xlarge, cr1.8xlarge, m1.small, c1.medium, cg1.4xlarge, m1.xlarge,
m2.xlarge, hi1.4xlarge, t1.micro, m2.4xlarge, m2.2xlarge, m3.xlarge,
cc1.4xlarge)
NODE_INSTANCE_TYPE = m2.2xlarge
VOLUMES = CFSANdata3
CLUSTER_SHELL = bash
CLUSTER_USER = ubuntu
#############################
## Configuring EBS Volumes ##
#############################
[volume CFSANdata3]
#attach vol-c9999999 to /home on master node and NFS-shre to worker nodes
VOLUME_ID = vol-xxxxxx
MOUNT_PATH = /home/ubuntu/CFSANdata
[plugin ipcluster]
SETUP_CLASS = starcluster.plugins.ipcluster.IPCluster
*Creating cluster…
*ubuntu_at_ip-10-181-159-232:~$ starcluster start -c QIIMETest STAR-ELASTIC
StarCluster - (*
http://star.mit.edu/cluster*) (v. 0.94)
Software Tools for Academics and Researchers (STAR)
*
*
>>> Validating cluster template settings...
>>> Cluster template settings are valid
>>> Starting cluster...
>>> Launching a 2-node cluster...
>>> Creating security group _at_sc-STAR-ELASTIC...
Reservation:r-f8c93594
>>> Waiting for cluster to come up... (updating every 30s)
>>> Waiting for all nodes to be in a 'running' state...
2/2 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
100%
>>> Waiting for SSH to come up on all nodes...
2/2 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
100%
>>> Waiting for cluster to come up took 1.148 mins
>>> The master node is ec2-50-19-65-196.compute-1.amazonaws.com
>>> Configuring cluster...
>>> Attaching volume vol-12183458 to master node on /dev/sdz ...
>>> Waiting for vol-12183458 to transition to: attached...
>>> Running plugin starcluster.clustersetup.DefaultClusterSetup
>>> Configuring hostnames...
2/2 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
100%
>>> Mounting EBS volume vol-12183458 on /home/ubuntu/CFSANdata...
>>> Creating cluster user: ubuntu (uid: 1000, gid: 1000)
2/2 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
100%
>>> Configuring scratch space for user(s): ubuntu
2/2 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
100%
>>> Configuring /etc/hosts on each node
2/2 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
100%
>>> Starting NFS server on master
>>> Configuring NFS exports path(s):
/home /home/ubuntu/CFSANdata
>>> Mounting all NFS export path(s) on 1 worker node(s)
1/1 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
100%
>>> Setting up NFS took 0.087 mins
>>> Configuring passwordless ssh for root
>>> Configuring passwordless ssh for ubuntu
>>> Running plugin starcluster.plugins.sge.SGEPlugin
>>> Configuring SGE...
>>> Configuring NFS exports path(s):
/opt/sge6
>>> Mounting all NFS export path(s) on 1 worker node(s)
1/1 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
100%
>>> Setting up NFS took 0.018 mins
>>> Installing Sun Grid Engine...
1/1 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
100%
>>> Creating SGE parallel environment 'orte'
2/2 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
100%
>>> Adding parallel environment 'orte' to queue 'all.q'
>>> Configuring cluster took 0.867 mins
>>> Starting cluster took 2.060 mins
The cluster is now ready to use. To login to the master node
as root, run:
$ starcluster sshmaster STAR-ELASTIC
If you're having issues with the cluster you can reboot the
instances and completely reconfigure the cluster from
scratch using:
$ starcluster restart STAR-ELASTIC
When you're finished using the cluster and wish to terminate
it and stop paying for service:
$ starcluster terminate STAR-ELASTIC
Alternatively, if the cluster uses EBS instances, you can
use the 'stop' command to shutdown all nodes and put them
into a 'stopped' state preserving the EBS volumes backing
the nodes:
$ starcluster stop STAR-ELASTIC
WARNING: Any data stored in ephemeral storage (usually /mnt)
will be lost!
You can activate a 'stopped' cluster by passing the -x
option to the 'start' command:
$ starcluster start -x STAR-ELASTIC
This will start all 'stopped' nodes and reconfigure the
cluster.
*ubuntu_at_ip-$ starcluster loadbalance -m 80 -a 2 -n 2 -d -w 60 STAR-ELASTIC
*StarCluster - (*
http://star.mit.edu/cluster*) (v. 0.94)
Software Tools for Academics and Researchers (STAR)
>>> Starting load balancer (Use ctrl-c to exit)
Maximum cluster size: 80
Minimum cluster size: 2
Cluster growth rate: 2 nodes/iteration
>>> Writing stats to file:
/home/ubuntu/.starcluster/sge/STAR-ELASTIC/sge-stats.csv
>>> Loading full job history
*** WARNING - Failed to retrieve stats (1/5):
Traceback (most recent call last):
File
"/usr/local/lib/python2.7/dist-packages/StarCluster-0.94-py2.7.egg/starcluster/balancers/sge/__init__.py",
line 536, in get_stats
return self._get_stats()
File
"/usr/local/lib/python2.7/dist-packages/StarCluster-0.94-py2.7.egg/starcluster/balancers/sge/__init__.py",
line 507, in _get_stats
qstatxml = '\n'.join(master.ssh.execute(qstat_cmd))
File
"/usr/local/lib/python2.7/dist-packages/StarCluster-0.94-py2.7.egg/starcluster/sshutils/__init__.py",
line 555, in execute
msg, command, exit_status, out_str)
RemoteCommandFailed: remote command 'source /etc/profile && qstat -u \*
-xml -f -r' failed with status 2:
qstat: invalid option -- 'm'
qstat: conflicting options.
usage:
qstat [-f [-1]] [-W site_specific] [-x] [ job_identifier... |
destination... ]
qstat [-a|-i|-r|-e] [-u user] [-n [-1]] [-s] [-G|-M] [-R] [job_id... |
destination...]
qstat -Q [-f [-1]] [-W site_specific] [ destination... ]
qstat -q [-G|-M] [ destination... ]
qstat -B [-f [-1]] [-W site_specific] [ server_name... ]
*** WARNING - Retrying in 60s
Received on Thu Aug 29 2013 - 08:14:16 EDT