StarCluster - Mailing List Archive

Re: Starcluster SGE usage

From: John St. John <no email>
Date: Thu, 18 Oct 2012 10:55:02 -0700

Ok just submitted a pull request. I modified the sge template "sge_pe_template" so that you can modify the allocation type in a new parallel environment. I modified the SGE plugin so that it makes by_node as well as ORTE, and by_node uses the $pe_slots allocation type. I have tested this out with creating a cluster (haven't tried adding/deleting nodes) and it seems to work. The changes are pretty minimal to get here so I feel pretty confident that I didn't add any new bugs.

Best,
John


On Oct 18, 2012, at 10:07 AM, John St. John <johnthesaintjohn_at_gmail.com> wrote:

> Whoops, what I meant to say is that I would like to hammer something out that gets the job done. I am on IRC now in the place you suggested ( I think, never used IRC before ).
>
> On Oct 18, 2012, at 9:04 AM, Justin Riley <jtriley_at_MIT.EDU> wrote:
>
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> Hey Guys,
>>
>> Glad you figured out what needed to be changed in the SGE
>> configuration. I've been meaning to add a bunch more options to the
>> SGE Plugin to configure things like this along with other SGE tuning
>> parameters for some time now but simply haven't had the time. If
>> either of you are interested in working on a PR to do this that'd be
>> awesome. All of the SGE magic is here:
>>
>> https://github.com/jtriley/StarCluster/blob/develop/starcluster/plugins/sge.py
>>
>> and here's the SGE install and parallel environment templates used by
>> StarCluster:
>>
>> https://github.com/jtriley/StarCluster/blob/develop/starcluster/templates/sge.py
>>
>> I'm happy to discuss the plugin and some of the changes that would be
>> needed on IRC (freenode: #starcluster).
>>
>> ~Justin
>>
>>
>> On 10/18/2012 08:30 AM, Gavin W. Burris wrote:
>>> Hi John,
>>>
>>> You got it. Keeping all on the same node requires $pe_slots. This
>>> is the same setting you would use for something like OpenMP.
>>>
>>> As for configuring the queue automatically, maybe there is an
>>> option in the SGE plugin that we can place in the
>>> ~/.starcluster/config file? I'd like to know, too. If not, we
>>> could maybe add some code. Or keep a shell script on a persistent
>>> volume that we run that does the needed qconf foo commands after
>>> starting a new head node.
>>>
>>> Cheers.
>>>
>>>
>>> On 10/17/2012 05:23 PM, John St. John wrote:
>>>> Hi Gavin, Thanks for pointing me in the right direction. I found
>>>> a great solution though that seems to work really well. Since the
>>>> "slots" is already set up to be equal to the core count on each
>>>> node, I just needed access to a parallel environment that allowed
>>>> me to submit jobs to nodes, but request a certain number of slots
>>>> on a single node rather than spread out across N nodes. Changing
>>>> the allocation rule to "fill" would probably still overflow into
>>>> multiple nodes at the edge case. The way to do this properly is
>>>> with the $pe_slots allocation rule in the parallel environment
>>>> config file. Here is what I did:
>>>>
>>>> qconf -sp by_node (create this with qconf -ap [name])
>>>>
>>>> pe_name by_node slots 9999999 user_lists
>>>> NONE xuser_lists NONE start_proc_args /bin/true
>>>> stop_proc_args /bin/true allocation_rule $pe_slots
>>>> control_slaves TRUE job_is_first_task TRUE urgency_slots
>>>> min accounting_summary FALSE
>>>>
>>>>
>>>> Then I modify the parallel environment list in all.q: qconf -mq
>>>> all.q pe_list make orte by_node
>>>>
>>>> That does it! Wahoo!
>>>>
>>>> Ok now the problem is that I want this done automatically
>>>> whenever a cluster is booted up, and if a node is added I want to
>>>> make sure these configurations aren't clobbered. Any suggestions
>>>> on making that happen?
>>>>
>>>> Thanks everyone for your time!
>>>>
>>>> Best, John
>>>>
>>>>
>>>> On Oct 17, 2012, at 8:16 AM, Gavin W. Burris <bug_at_sas.upenn.edu>
>>>> wrote:
>>>>
>>>>> Hi John,
>>>>>
>>>>> The default configuration will distribute jobs based on load,
>>>>> meaning new jobs land on the least loaded node. If you want to
>>>>> fill nodes, you can change the load formula on the scheduler
>>>>> config: # qconf -msconf load_formula slots
>>>>>
>>>>> If you are using a parallel environment, the default can be
>>>>> changed to fill a node, as well: # qconf -mp orte
>>>>> allocation_rule $fill_up
>>>>>
>>>>> You may want to consider making memory consumable to prevent
>>>>> over-subscription. An easy option may be to make an arbitrary
>>>>> consumable complex resource, say john_jobs, and set it to the
>>>>> max number you want running at one time: # qconf -mc john_jobs
>>>>> jj INT <= YES YES 0 0 # qconf -me global complex_values
>>>>> john_jobs=10
>>>>>
>>>>> Then, when you submit a job, specify the resource: $ qsub -l
>>>>> jj=1 ajob.sh
>>>>>
>>>>> Each job submitted in this way will consume one count of
>>>>> john_jobs, effectively limiting you to ten.
>>>>>
>>>>> Cheers.
>>>>>
>>>>>
>>>>> On 10/16/2012 06:32 PM, John St. John wrote:
>>>>>> Thanks Jesse!
>>>>>>
>>>>>> This does seem to work. I don't need to define -pe in this
>>>>>> case b/c the slots are actually limited per node.
>>>>>>
>>>>>> My only problem with this solution is that all jobs are now
>>>>>> limited to this hard coded number of slots, and also when
>>>>>> nodes are added to the cluster while it is running the file
>>>>>> is modified and the line would need to be edited again. On
>>>>>> other systems I have seen the ability to specify that a job
>>>>>> will use a specific number of CPU's without being in a
>>>>>> special parallel environment I have seen the "-l ncpus=X"
>>>>>> option working, but it does't seem to with the default
>>>>>> starcluster setup. Also it looks like the "orte" parallel
>>>>>> environment has some stuff very specific to MPI, and doesn't
>>>>>> have a problem splitting the requested number of slots
>>>>>> between multiple nodes, which I definitely don't want. I just
>>>>>> want to limit the number of jobs per node, but be able to
>>>>>> specify that at runtime.
>>>>>>
>>>>>> It looks like the grid engine is somehow aware of the number
>>>>>> of CPU's available on each node. I get this with by running
>>>>>> `qhost`: HOSTNAME ARCH NCPU LOAD
>>>>>> MEMTOT MEMUSE SWAPTO SWAPUS
>>>>>> -------------------------------------------------------------------------------
>>>>>>
>>>>>>
>> global - - - - - -
>>>>>> - master linux-x64 8 0.88 67.1G
>>>>>> 1.5G 0.0 0.0 node001 linux-x64 8
>>>>>> 0.36 67.1G 917.3M 0.0 0.0 node002
>>>>>> linux-x64 8 0.04 67.1G 920.4M 0.0 0.0 node003
>>>>>> linux-x64 8 0.04 67.1G 887.3M 0.0 0.0 node004
>>>>>> linux-x64 8 0.06 67.1G 911.4M 0.0 0.0
>>>>>>
>>>>>>
>>>>>> So it seems like there should be a way to tell qsub that job
>>>>>> X is using some subset of the available CPU, or RAM, so that
>>>>>> it doesn't oversubscribe the node.
>>>>>>
>>>>>> Thanks for your time!
>>>>>>
>>>>>> Best, John
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Oct 16, 2012, at 2:12 PM, Jesse Lu <jesselu_at_stanford.edu
>>>>>> <mailto:jesselu_at_stanford.edu>> wrote:
>>>>>>
>>>>>>> You can modify the all.q queue to assign a fixed number of
>>>>>>> slots to each node.
>>>>>>>
>>>>>>> * If I remember correctly, "$ qconf -mq all.q" will bring
>>>>>>> up the configuration of the all.q queue in an editor. *
>>>>>>> Under the "slots" attribute should be a semilengthly string
>>>>>>> such as "[node001=16],[node002=16],..." * Try replacing the
>>>>>>> entire string with a single number such as "2". This should
>>>>>>> assign each host to have only two slots. * Save the
>>>>>>> configuration and try a simple submission with the 'orte'
>>>>>>> parallel environment and let me know if it works.
>>>>>>>
>>>>>>> Jesse
>>>>>>>
>>>>>>> On Tue, Oct 16, 2012 at 1:37 PM, John St. John
>>>>>>> <johnthesaintjohn_at_gmail.com
>>>>>>> <mailto:johnthesaintjohn_at_gmail.com>> wrote:
>>>>>>>
>>>>>>> Hello, I am having issues telling qsub to limit the number
>>>>>>> of jobs ran at any one time on each node of the cluster.
>>>>>>> There are sometimes ways to do this with things like "qsub
>>>>>>> -l node=1:ppn=1" or "qsub -l procs=2" or something. I even
>>>>>>> tried "qsub -l slots=2" but that gave me an error and told
>>>>>>> me to use the parallel environment. When I tried to use the
>>>>>>> "orte" parallel environment like "-pe orte 2" I see
>>>>>>> "slots=2" in my qstat list, but everything gets executed
>>>>>>> on one node at the same parallelization as before. How do I
>>>>>>> limit the number of jobs per node? I am running a process
>>>>>>> that consumes a very large amount of ram.
>>>>>>>
>>>>>>> Thanks, John
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________ StarCluster
>>>>>>> mailing list StarCluster_at_mit.edu
>>>>>>> <mailto:StarCluster_at_mit.edu>
>>>>>>> http://mailman.mit.edu/mailman/listinfo/starcluster
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________ StarCluster
>>>>>> mailing list StarCluster_at_mit.edu
>>>>>> http://mailman.mit.edu/mailman/listinfo/starcluster
>>>>>>
>>>>>
>>>>> -- Gavin W. Burris Senior Systems Programmer Information
>>>>> Security and Unix Systems School of Arts and Sciences
>>>>> University of Pennsylvania
>>>>
>>>>
>>>
>>
>> -----BEGIN PGP SIGNATURE-----
>> Version: GnuPG v2.0.19 (GNU/Linux)
>> Comment: Using GnuPG with Mozilla - http://www.enigmail.net/
>>
>> iEYEARECAAYFAlCAKJAACgkQ4llAkMfDcrkzmgCgkXOBPBXw5Q41RF+qABuPH2NH
>> seQAoIqVmbTjgIrsPfFIJpj7POwbxcKf
>> =wRr3
>> -----END PGP SIGNATURE-----
>
Received on Thu Oct 18 2012 - 13:55:06 EDT
This archive was generated by hypermail 2.3.0.

Search:

Sort all by:

Date

Month

Thread

Author

Subject