Hi Justin,
Thank you very much for your help. I meant to send a follow-up e-mail
earlier but it slipped my mind. The orte parallel environment performs
a round robin schedule by default. This causes Sun Grid Engine to
invoke the OpenMP-threaded process several times on multiple nodes.
When calling 'top', there are dozens of job processes running on each
node. The cluster becomes unresponsive to SSH requests.
To get around this, I created a new parallel environment 'smp' and
replaced $round_robin with $pe_slots.
This is achieved by typing
$ qconf -ap smp
editing the allocation_rule field in the parallel environment
specification file,
$ qconf -mp smp
The file should look something like this:
pe_name smp
slots XXX
user_lists NONE
xuser_lists NONE
start_proc_args /bin/true
stop_proc_args /bin/true
allocation_rule $pe_slots
control_slaves FALSE
job_is_first_task TRUE
urgency_slots min
accounting_summary FALSE
where XXX is the number nodes times the number of cores per node. Then
I edit the all.1 queue specification file
$ qconf -mq
and add the parallel environment 'smp' to the 'pe_list' field.
Running qsub -pe smp 8 my-open-mp-script.sh then works smoothly. I am
just speculating but if NSLOTS does not divide evenly the number of
cores on a machine, there is a possibility a job is erroneously spread
across multiple nodes.
To attain better CPU utilization, I set the number of threads to a
number higher than the number of slots
export OMP_NUM_THREADS=$((NSLOTS+4))
Cheers,
Damian
On Mon, Apr 18, 2011 at 9:31 PM, Justin Riley <jtriley_at_mit.edu> wrote:
> Hi Damian,
>
> It's been a little while since I've played with OpenMP but from what I remember you need to set OMP_NUM_THREADS equal to the number of slots you allocate using the parallel environment. In theory, you should be able to use the same command:
>
> $ qsub -pe orte X open-mp-script.sh [args]
>
> And inside open-mp-script.sh you would need to export OMP_NUM_THREADS=$NSLOTS and then run your OpenMP binary like so:
>
> $ cat open-mp-script.sh
> export OMP_NUM_THREADS=$NSLOTS
> ...
> /path/to/my/openmp/binary $*
> ....
>
> Don't forget to make your binary executable (chmod +x <binary>). Let me know how this goes. If that doesn't work I'll look into this further.
>
> HTH,
>
> ~Justin
>
> On Apr 18, 2011, at 11:38 PM, Damian Eads wrote:
>
>> Hi,
>>
>> Last year, I was using MPI and it was suggested by Justin to use
>>
>> qsub -pe orte X mpi-job-script.sh [mpi job arguments]
>>
>> to add an MPI job to the queue (where X is the number of slots for the job).
>>
>> Now, my situation is slightly different. I am no longer using MPI but
>> OpenMP (you know, #pragma parallel before certain for loops). What
>> process manager should I use with Sun Grid Engine in this case? How
>> would I specify how many slots the job should use?
>>
>> Thank you in advance.
>>
>> Kind regards,
>>
>> Damian
>> _______________________________________________
>> StarCluster mailing list
>> StarCluster_at_mit.edu
>> http://mailman.mit.edu/mailman/listinfo/starcluster
Received on Sun May 01 2011 - 06:20:47 EDT