StarCluster - Mailing List Archive

Re: Starcluster SGE usage

From: Gavin W. Burris <no email>
Date: Thu, 18 Oct 2012 08:30:58 -0400

Hi John,

You got it. Keeping all on the same node requires $pe_slots. This is
the same setting you would use for something like OpenMP.

As for configuring the queue automatically, maybe there is an option in
the SGE plugin that we can place in the ~/.starcluster/config file? I'd
like to know, too. If not, we could maybe add some code. Or keep a
shell script on a persistent volume that we run that does the needed
qconf foo commands after starting a new head node.


On 10/17/2012 05:23 PM, John St. John wrote:
> Hi Gavin,
> Thanks for pointing me in the right direction. I found a great solution though that seems to work really well. Since the "slots" is already set up to be equal to the core count on each node, I just needed access to a parallel environment that allowed me to submit jobs to nodes, but request a certain number of slots on a single node rather than spread out across N nodes. Changing the allocation rule to "fill" would probably still overflow into multiple nodes at the edge case. The way to do this properly is with the $pe_slots allocation rule in the parallel environment config file. Here is what I did:
> qconf -sp by_node (create this with qconf -ap [name])
> pe_name by_node
> slots 9999999
> user_lists NONE
> xuser_lists NONE
> start_proc_args /bin/true
> stop_proc_args /bin/true
> allocation_rule $pe_slots
> control_slaves TRUE
> job_is_first_task TRUE
> urgency_slots min
> accounting_summary FALSE
> Then I modify the parallel environment list in all.q:
> qconf -mq all.q
> pe_list make orte by_node
> That does it! Wahoo!
> Ok now the problem is that I want this done automatically whenever a cluster is booted up, and if a node is added I want to make sure these configurations aren't clobbered. Any suggestions on making that happen?
> Thanks everyone for your time!
> Best,
> John
> On Oct 17, 2012, at 8:16 AM, Gavin W. Burris <> wrote:
>> Hi John,
>> The default configuration will distribute jobs based on load, meaning
>> new jobs land on the least loaded node. If you want to fill nodes, you
>> can change the load formula on the scheduler config:
>> # qconf -msconf
>> load_formula slots
>> If you are using a parallel environment, the default can be changed to
>> fill a node, as well:
>> # qconf -mp orte
>> allocation_rule $fill_up
>> You may want to consider making memory consumable to prevent
>> over-subscription. An easy option may be to make an arbitrary
>> consumable complex resource, say john_jobs, and set it to the max number
>> you want running at one time:
>> # qconf -mc
>> john_jobs jj INT <= YES YES 0 0
>> # qconf -me global
>> complex_values john_jobs=10
>> Then, when you submit a job, specify the resource:
>> $ qsub -l jj=1
>> Each job submitted in this way will consume one count of john_jobs,
>> effectively limiting you to ten.
>> Cheers.
>> On 10/16/2012 06:32 PM, John St. John wrote:
>>> Thanks Jesse!
>>> This does seem to work. I don't need to define -pe in this case b/c the
>>> slots are actually limited per node.
>>> My only problem with this solution is that all jobs are now limited to
>>> this hard coded number of slots, and also when nodes are added to the
>>> cluster while it is running the file is modified and the line would need
>>> to be edited again. On other systems I have seen the ability to specify
>>> that a job will use a specific number of CPU's without being in a
>>> special parallel environment I have seen the "-l ncpus=X" option
>>> working, but it does't seem to with the default starcluster setup. Also
>>> it looks like the "orte" parallel environment has some stuff very
>>> specific to MPI, and doesn't have a problem splitting the requested
>>> number of slots between multiple nodes, which I definitely don't want. I
>>> just want to limit the number of jobs per node, but be able to specify
>>> that at runtime.
>>> It looks like the grid engine is somehow aware of the number of CPU's
>>> available on each node. I get this with by running `qhost`:
>>> -------------------------------------------------------------------------------
>>> global - - - - - -
>>> -
>>> master linux-x64 8 0.88 67.1G 1.5G 0.0
>>> 0.0
>>> node001 linux-x64 8 0.36 67.1G 917.3M 0.0
>>> 0.0
>>> node002 linux-x64 8 0.04 67.1G 920.4M 0.0
>>> 0.0
>>> node003 linux-x64 8 0.04 67.1G 887.3M 0.0
>>> 0.0
>>> node004 linux-x64 8 0.06 67.1G 911.4M 0.0
>>> 0.0
>>> So it seems like there should be a way to tell qsub that job X is using
>>> some subset of the available CPU, or RAM, so that it doesn't
>>> oversubscribe the node.
>>> Thanks for your time!
>>> Best,
>>> John
>>> On Oct 16, 2012, at 2:12 PM, Jesse Lu <
>>> <>> wrote:
>>>> You can modify the all.q queue to assign a fixed number of slots to
>>>> each node.
>>>> * If I remember correctly, "$ qconf -mq all.q" will bring up the
>>>> configuration of the all.q queue in an editor.
>>>> * Under the "slots" attribute should be a semilengthly string such
>>>> as "[node001=16],[node002=16],..."
>>>> * Try replacing the entire string with a single number such as "2".
>>>> This should assign each host to have only two slots.
>>>> * Save the configuration and try a simple submission with the 'orte'
>>>> parallel environment and let me know if it works.
>>>> Jesse
>>>> On Tue, Oct 16, 2012 at 1:37 PM, John St. John
>>>> < <>> wrote:
>>>> Hello,
>>>> I am having issues telling qsub to limit the number of jobs ran at
>>>> any one time on each node of the cluster. There are sometimes ways
>>>> to do this with things like "qsub -l node=1:ppn=1" or "qsub -l
>>>> procs=2" or something. I even tried "qsub -l slots=2" but that
>>>> gave me an error and told me to use the parallel environment. When
>>>> I tried to use the "orte" parallel environment like "-pe orte 2" I
>>>> see "slots=2" in my qstat list, but everything gets executed on
>>>> one node at the same parallelization as before. How do I limit the
>>>> number of jobs per node? I am running a process that consumes a
>>>> very large amount of ram.
>>>> Thanks,
>>>> John
>>>> _______________________________________________
>>>> StarCluster mailing list
>>>> <>
>>> _______________________________________________
>>> StarCluster mailing list
>> --
>> Gavin W. Burris
>> Senior Systems Programmer
>> Information Security and Unix Systems
>> School of Arts and Sciences
>> University of Pennsylvania

Gavin W. Burris
Senior Systems Programmer
Information Security and Unix Systems
School of Arts and Sciences
University of Pennsylvania
Received on Thu Oct 18 2012 - 08:31:12 EDT
This archive was generated by hypermail 2.3.0.


Sort all by: