StarCluster - Mailing List Archive

Re: [Starcluster] Multiple MPI jobs on SunGrid Engine with StarCluster

From: Damian Eads <no email>
Date: Mon, 21 Jun 2010 10:02:35 -0700

Hi Justin,

Thank you for your explanation on MPI/SGE integration. A few quick questions:

On Mon, Jun 21, 2010 at 9:46 AM, Justin Riley <jtriley_at_mit.edu> wrote:
> So the above explains the parallel environment setup within SGE. It
> turns out that if you're using a parallel environment with OpenMPI, you
> do not have to specify --byslot/--bynodes/-np/-host/etc options to
> mpirun given that SGE will handle the round_robin/fill_up modes for you
> and automatically assign hosts and number of processors to be used by
> OpenMPI.
>
> So, for your use case I would change the commands as follows:
> - -----------------------------------------------------------------
>    qsub -pe orte 24 ./myjobscript.sh experiment-1
>    qsub -pe orte 24 ./myjobscript.sh experiment-2
>    qsub -pe orte 24 ./myjobscript.sh experiment-3
>    ...
>    qsub -pe orte 24 ./myjobscript.sh experiment-100
>
> where ./myjobscript.sh calls mpirun as follows
>
>    mpirun -x PYTHONPATH=/data/prefix/lib/python2.6/site-packages
>                 -wd /data/experiments ./myprogram $1

So I don't need to specify the number of slots to use in mpirun? The
Sun GridEngine will somehow pass this information to mpirun? Is the
mpirun argument -n 1 by default?

> - -----------------------------------------------------------------
> NOTE: You can also pass -wd to the qsub command instead of mpirun and
> along the same lines I believe you can pass -v option to qsub rather
> than -x to mpirun. Neither of these should make a difference, just
> shifts where the -x/-wd concern is (from MPI to SGE).

Good idea. This will clean things up a bit. Thanks for suggesting it.

> I will add a section to the docs about using SGE/OpenMPI integration on
> StarCluster based on this email.
>
>> Perhaps if carefully
>> used, this will ensure that there is a root MPI process running on the
>> master node for every MPI job that's simultaneously running.
>
> Is this a requirement for you to have a root MPI process on the master
> node for every MPI job? If you're worried about oversubscribing the
> master node with MPI processes, then this SGE/OpenMPI integration should
> relieve those concerns. If not, what's the reason for needing a 'root
> MPI process' running on the master node for every MPI job?

It's not a requirement but reflects some ignorance on my part. <aybe
I'm confused about why the first node is called master. I was assuming
it had that name because it was performing some kind of special
coordination.

Do I still need to provide the -hostfile option? Or is this automatic now?

Thanks a lot!

Damian

-----------------------------------------------------
Damian Eads Ph.D. Candidate
University of California Computer Science
1156 High Street Machine Learning Lab, E2-489
Santa Cruz, CA 95064 http://www.soe.ucsc.edu/~eads
Received on Mon Jun 21 2010 - 13:02:37 EDT
This archive was generated by hypermail 2.3.0.

Search:

Sort all by:

Date

Month

Thread

Author

Subject