StarCluster - Mailing List Archive

[Starcluster] Multiple MPI jobs on SunGrid Engine with StarCluster

From: Damian Eads <no email>
Date: Sun, 20 Jun 2010 16:57:22 -0700

Hi,

Anyone have experience queueing up multiple MPI jobs on StarCluster?
Does every MPI job require at least one process running on the root
node? I'd rather not create several clusters to have multiple MPI jobs
I would need to replicate my data volume for each cluster created and
manually ensure the data on the volume replications is consistent.

For example, suppose I want to queue 100 MPI jobs with each job
requiring three 8 core instances each (24 cores). If I allocate 18
c1.xlarge instances (18*8=144 cores), I could queue up the jobs with

   qsub -pe 24 ./myjobscript.sh experiment-1
   qsub -pe 24 ./myjobscript.sh experiment-2
   qsub -pe 24 ./myjobscript.sh experiment-3
   ...
   qsub -pe 24 ./myjobscript.sh experiment-100

where ./myjobscript.sh calls mpirun as follows

    mpirun -byslot -x PYTHONPATH=/data/prefix/lib/python2.6/site-packages \
                -wd /data/experiments -host
master,node001,node002,node003,node004,node005,...,node018 \
                -np 24 ./myprogram $1

Does anyone know if this will work? I'm concerned that when the first
job is started, the root node will have all of its cores used. I
noticed the -byslot option in the manpages, which allocates cores
across the cluster in a round-robin fashion. Perhaps if carefully
used, this will ensure that there is a root MPI process running on the
master node for every MPI job that's simultaneously running.

If anyone has any experience and can give me a push in the right
direction, I'd greatly appreciate it.

Thanks!

Kind regards,

Damian




-- 
-----------------------------------------------------
Damian Eads                           Ph.D. Candidate
University of California             Computer Science
1156 High Street         Machine Learning Lab, E2-489
Santa Cruz, CA 95064    http://www.soe.ucsc.edu/~eads
Received on Sun Jun 20 2010 - 19:57:24 EDT
This archive was generated by hypermail 2.3.0.

Search:

Sort all by:

Date

Month

Thread

Author

Subject