Re: Starcluster SGE usage

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

From: Gavin W. Burris <no email>
Date: Wed, 17 Oct 2012 11:16:15 -0400

Hi John,

The default configuration will distribute jobs based on load, meaning
new jobs land on the least loaded node. If you want to fill nodes, you
can change the load formula on the scheduler config:
# qconf -msconf
load_formula slots

If you are using a parallel environment, the default can be changed to
fill a node, as well:
# qconf -mp orte
allocation_rule $fill_up

You may want to consider making memory consumable to prevent
over-subscription. An easy option may be to make an arbitrary
consumable complex resource, say john_jobs, and set it to the max number
you want running at one time:
# qconf -mc
john_jobs jj INT <= YES YES 0 0
# qconf -me global
complex_values john_jobs=10

Then, when you submit a job, specify the resource:
$ qsub -l jj=1 ajob.sh

Each job submitted in this way will consume one count of john_jobs,
effectively limiting you to ten.

Cheers.

On 10/16/2012 06:32 PM, John St. John wrote:
> Thanks Jesse!
>
> This does seem to work. I don't need to define -pe in this case b/c the
> slots are actually limited per node.
>
> My only problem with this solution is that all jobs are now limited to
> this hard coded number of slots, and also when nodes are added to the
> cluster while it is running the file is modified and the line would need
> to be edited again. On other systems I have seen the ability to specify
> that a job will use a specific number of CPU's without being in a
> special parallel environment I have seen the "-l ncpus=X" option
> working, but it does't seem to with the default starcluster setup. Also
> it looks like the "orte" parallel environment has some stuff very
> specific to MPI, and doesn't have a problem splitting the requested
> number of slots between multiple nodes, which I definitely don't want. I
> just want to limit the number of jobs per node, but be able to specify
> that at runtime.
>
> It looks like the grid engine is somehow aware of the number of CPU's
> available on each node. I get this with by running `qhost`:
> HOSTNAME ARCH NCPU LOAD MEMTOT MEMUSE SWAPTO
> SWAPUS
> -------------------------------------------------------------------------------
> global - - - - - -
> -
> master linux-x64 8 0.88 67.1G 1.5G 0.0
> 0.0
> node001 linux-x64 8 0.36 67.1G 917.3M 0.0
> 0.0
> node002 linux-x64 8 0.04 67.1G 920.4M 0.0
> 0.0
> node003 linux-x64 8 0.04 67.1G 887.3M 0.0
> 0.0
> node004 linux-x64 8 0.06 67.1G 911.4M 0.0
> 0.0
>
>
> So it seems like there should be a way to tell qsub that job X is using
> some subset of the available CPU, or RAM, so that it doesn't
> oversubscribe the node.
>
> Thanks for your time!
>
> Best,
> John
>
>
>
>
>
> On Oct 16, 2012, at 2:12 PM, Jesse Lu <jesselu_at_stanford.edu
> <mailto:jesselu_at_stanford.edu>> wrote:
>
>> You can modify the all.q queue to assign a fixed number of slots to
>> each node.
>>
>> * If I remember correctly, "$ qconf -mq all.q" will bring up the
>> configuration of the all.q queue in an editor.
>> * Under the "slots" attribute should be a semilengthly string such
>> as "[node001=16],[node002=16],..."
>> * Try replacing the entire string with a single number such as "2".
>> This should assign each host to have only two slots.
>> * Save the configuration and try a simple submission with the 'orte'
>> parallel environment and let me know if it works.
>>
>> Jesse
>>
>> On Tue, Oct 16, 2012 at 1:37 PM, John St. John
>> <johnthesaintjohn_at_gmail.com <mailto:johnthesaintjohn_at_gmail.com>> wrote:
>>
>> Hello,
>> I am having issues telling qsub to limit the number of jobs ran at
>> any one time on each node of the cluster. There are sometimes ways
>> to do this with things like "qsub -l node=1:ppn=1" or "qsub -l
>> procs=2" or something. I even tried "qsub -l slots=2" but that
>> gave me an error and told me to use the parallel environment. When
>> I tried to use the "orte" parallel environment like "-pe orte 2" I
>> see "slots=2" in my qstat list, but everything gets executed on
>> one node at the same parallelization as before. How do I limit the
>> number of jobs per node? I am running a process that consumes a
>> very large amount of ram.
>>
>> Thanks,
>> John
>>
>>
>> _______________________________________________
>> StarCluster mailing list
>> StarCluster_at_mit.edu <mailto:StarCluster_at_mit.edu>
>> http://mailman.mit.edu/mailman/listinfo/starcluster
>>
>>
>
>
>
> _______________________________________________
> StarCluster mailing list
> StarCluster_at_mit.edu
> http://mailman.mit.edu/mailman/listinfo/starcluster
>

-- 
Gavin W. Burris
Senior Systems Programmer
Information Security and Unix Systems
School of Arts and Sciences
University of Pennsylvania

Received on Wed Oct 17 2012 - 11:16:28 EDT

This message: [ Message body ]
Next message: Jean-Pierre Adam: "Re: exclude the master node"
Previous message: Ed Morris: "Re: exclude the master node"
In reply to: John St. John: "Re: Starcluster SGE usage"
Next in thread: John St. John: "Re: Starcluster SGE usage"
Reply: John St. John: "Re: Starcluster SGE usage"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

This archive was generated by hypermail 2.3.0.

Search:

Sort all by:

Navigation

Re: Starcluster SGE usage

Search:

Sort all by:

Navigation