Hi,
Anyone have experience queueing up multiple MPI jobs on StarCluster?
Does every MPI job require at least one process running on the root
node? I'd rather not create several clusters to have multiple MPI jobs
I would need to replicate my data volume for each cluster created and
manually ensure the data on the volume replications is consistent.
For example, suppose I want to queue 100 MPI jobs with each job
requiring three 8 core instances each (24 cores). If I allocate 18
c1.xlarge instances (18*8=144 cores), I could queue up the jobs with
qsub -pe 24 ./myjobscript.sh experiment-1
qsub -pe 24 ./myjobscript.sh experiment-2
qsub -pe 24 ./myjobscript.sh experiment-3
...
qsub -pe 24 ./myjobscript.sh experiment-100
where ./myjobscript.sh calls mpirun as follows
mpirun -byslot -x PYTHONPATH=/data/prefix/lib/python2.6/site-packages \
-wd /data/experiments -host
master,node001,node002,node003,node004,node005,...,node018 \
-np 24 ./myprogram $1
Does anyone know if this will work? I'm concerned that when the first
job is started, the root node will have all of its cores used. I
noticed the -byslot option in the manpages, which allocates cores
across the cluster in a round-robin fashion. Perhaps if carefully
used, this will ensure that there is a root MPI process running on the
master node for every MPI job that's simultaneously running.
If anyone has any experience and can give me a push in the right
direction, I'd greatly appreciate it.
Thanks!
Kind regards,
Damian
--
-----------------------------------------------------
Damian Eads Ph.D. Candidate
University of California Computer Science
1156 High Street Machine Learning Lab, E2-489
Santa Cruz, CA 95064 http://www.soe.ucsc.edu/~eads
Received on Sun Jun 20 2010 - 19:57:24 EDT