StarCluster - Mailing List Archive

Re: Starcluster SGE usage

From: Justin Riley <no email>
Date: Thu, 18 Oct 2012 12:04:32 -0400

Hash: SHA1

Hey Guys,

Glad you figured out what needed to be changed in the SGE
configuration. I've been meaning to add a bunch more options to the
SGE Plugin to configure things like this along with other SGE tuning
parameters for some time now but simply haven't had the time. If
either of you are interested in working on a PR to do this that'd be
awesome. All of the SGE magic is here:

and here's the SGE install and parallel environment templates used by

I'm happy to discuss the plugin and some of the changes that would be
needed on IRC (freenode: #starcluster).


On 10/18/2012 08:30 AM, Gavin W. Burris wrote:
> Hi John,
> You got it. Keeping all on the same node requires $pe_slots. This
> is the same setting you would use for something like OpenMP.
> As for configuring the queue automatically, maybe there is an
> option in the SGE plugin that we can place in the
> ~/.starcluster/config file? I'd like to know, too. If not, we
> could maybe add some code. Or keep a shell script on a persistent
> volume that we run that does the needed qconf foo commands after
> starting a new head node.
> Cheers.
> On 10/17/2012 05:23 PM, John St. John wrote:
>> Hi Gavin, Thanks for pointing me in the right direction. I found
>> a great solution though that seems to work really well. Since the
>> "slots" is already set up to be equal to the core count on each
>> node, I just needed access to a parallel environment that allowed
>> me to submit jobs to nodes, but request a certain number of slots
>> on a single node rather than spread out across N nodes. Changing
>> the allocation rule to "fill" would probably still overflow into
>> multiple nodes at the edge case. The way to do this properly is
>> with the $pe_slots allocation rule in the parallel environment
>> config file. Here is what I did:
>> qconf -sp by_node (create this with qconf -ap [name])
>> pe_name by_node slots 9999999 user_lists
>> NONE xuser_lists NONE start_proc_args /bin/true
>> stop_proc_args /bin/true allocation_rule $pe_slots
>> control_slaves TRUE job_is_first_task TRUE urgency_slots
>> min accounting_summary FALSE
>> Then I modify the parallel environment list in all.q: qconf -mq
>> all.q pe_list make orte by_node
>> That does it! Wahoo!
>> Ok now the problem is that I want this done automatically
>> whenever a cluster is booted up, and if a node is added I want to
>> make sure these configurations aren't clobbered. Any suggestions
>> on making that happen?
>> Thanks everyone for your time!
>> Best, John
>> On Oct 17, 2012, at 8:16 AM, Gavin W. Burris <>
>> wrote:
>>> Hi John,
>>> The default configuration will distribute jobs based on load,
>>> meaning new jobs land on the least loaded node. If you want to
>>> fill nodes, you can change the load formula on the scheduler
>>> config: # qconf -msconf load_formula slots
>>> If you are using a parallel environment, the default can be
>>> changed to fill a node, as well: # qconf -mp orte
>>> allocation_rule $fill_up
>>> You may want to consider making memory consumable to prevent
>>> over-subscription. An easy option may be to make an arbitrary
>>> consumable complex resource, say john_jobs, and set it to the
>>> max number you want running at one time: # qconf -mc john_jobs
>>> jj INT <= YES YES 0 0 # qconf -me global complex_values
>>> john_jobs=10
>>> Then, when you submit a job, specify the resource: $ qsub -l
>>> jj=1
>>> Each job submitted in this way will consume one count of
>>> john_jobs, effectively limiting you to ten.
>>> Cheers.
>>> On 10/16/2012 06:32 PM, John St. John wrote:
>>>> Thanks Jesse!
>>>> This does seem to work. I don't need to define -pe in this
>>>> case b/c the slots are actually limited per node.
>>>> My only problem with this solution is that all jobs are now
>>>> limited to this hard coded number of slots, and also when
>>>> nodes are added to the cluster while it is running the file
>>>> is modified and the line would need to be edited again. On
>>>> other systems I have seen the ability to specify that a job
>>>> will use a specific number of CPU's without being in a
>>>> special parallel environment I have seen the "-l ncpus=X"
>>>> option working, but it does't seem to with the default
>>>> starcluster setup. Also it looks like the "orte" parallel
>>>> environment has some stuff very specific to MPI, and doesn't
>>>> have a problem splitting the requested number of slots
>>>> between multiple nodes, which I definitely don't want. I just
>>>> want to limit the number of jobs per node, but be able to
>>>> specify that at runtime.
>>>> It looks like the grid engine is somehow aware of the number
>>>> of CPU's available on each node. I get this with by running
>>>> -------------------------------------------------------------------------------
global - - - - - -
>>>> - master linux-x64 8 0.88 67.1G
>>>> 1.5G 0.0 0.0 node001 linux-x64 8
>>>> 0.36 67.1G 917.3M 0.0 0.0 node002
>>>> linux-x64 8 0.04 67.1G 920.4M 0.0 0.0 node003
>>>> linux-x64 8 0.04 67.1G 887.3M 0.0 0.0 node004
>>>> linux-x64 8 0.06 67.1G 911.4M 0.0 0.0
>>>> So it seems like there should be a way to tell qsub that job
>>>> X is using some subset of the available CPU, or RAM, so that
>>>> it doesn't oversubscribe the node.
>>>> Thanks for your time!
>>>> Best, John
>>>> On Oct 16, 2012, at 2:12 PM, Jesse Lu <
>>>> <>> wrote:
>>>>> You can modify the all.q queue to assign a fixed number of
>>>>> slots to each node.
>>>>> * If I remember correctly, "$ qconf -mq all.q" will bring
>>>>> up the configuration of the all.q queue in an editor. *
>>>>> Under the "slots" attribute should be a semilengthly string
>>>>> such as "[node001=16],[node002=16],..." * Try replacing the
>>>>> entire string with a single number such as "2". This should
>>>>> assign each host to have only two slots. * Save the
>>>>> configuration and try a simple submission with the 'orte'
>>>>> parallel environment and let me know if it works.
>>>>> Jesse
>>>>> On Tue, Oct 16, 2012 at 1:37 PM, John St. John
>>>>> <
>>>>> <>> wrote:
>>>>> Hello, I am having issues telling qsub to limit the number
>>>>> of jobs ran at any one time on each node of the cluster.
>>>>> There are sometimes ways to do this with things like "qsub
>>>>> -l node=1:ppn=1" or "qsub -l procs=2" or something. I even
>>>>> tried "qsub -l slots=2" but that gave me an error and told
>>>>> me to use the parallel environment. When I tried to use the
>>>>> "orte" parallel environment like "-pe orte 2" I see
>>>>> "slots=2" in my qstat list, but everything gets executed
>>>>> on one node at the same parallelization as before. How do I
>>>>> limit the number of jobs per node? I am running a process
>>>>> that consumes a very large amount of ram.
>>>>> Thanks, John
>>>>> _______________________________________________ StarCluster
>>>>> mailing list
>>>>> <>
>>>> _______________________________________________ StarCluster
>>>> mailing list
>>> -- Gavin W. Burris Senior Systems Programmer Information
>>> Security and Unix Systems School of Arts and Sciences
>>> University of Pennsylvania

Version: GnuPG v2.0.19 (GNU/Linux)
Comment: Using GnuPG with Mozilla -

Received on Thu Oct 18 2012 - 12:04:38 EDT
This archive was generated by hypermail 2.3.0.


Sort all by: