StarCluster - Mailing List Archive

Re: Starcluster SGE usage

From: John St. John <no email>
Date: Tue, 16 Oct 2012 15:32:47 -0700

Thanks Jesse!

This does seem to work. I don't need to define -pe in this case b/c the slots are actually limited per node.

My only problem with this solution is that all jobs are now limited to this hard coded number of slots, and also when nodes are added to the cluster while it is running the file is modified and the line would need to be edited again. On other systems I have seen the ability to specify that a job will use a specific number of CPU's without being in a special parallel environment I have seen the "-l ncpus=X" option working, but it does't seem to with the default starcluster setup. Also it looks like the "orte" parallel environment has some stuff very specific to MPI, and doesn't have a problem splitting the requested number of slots between multiple nodes, which I definitely don't want. I just want to limit the number of jobs per node, but be able to specify that at runtime.

It looks like the grid engine is somehow aware of the number of CPU's available on each node. I get this with by running `qhost`:
HOSTNAME ARCH NCPU LOAD MEMTOT MEMUSE SWAPTO SWAPUS
-------------------------------------------------------------------------------
global - - - - - - -
master linux-x64 8 0.88 67.1G 1.5G 0.0 0.0
node001 linux-x64 8 0.36 67.1G 917.3M 0.0 0.0
node002 linux-x64 8 0.04 67.1G 920.4M 0.0 0.0
node003 linux-x64 8 0.04 67.1G 887.3M 0.0 0.0
node004 linux-x64 8 0.06 67.1G 911.4M 0.0 0.0


So it seems like there should be a way to tell qsub that job X is using some subset of the available CPU, or RAM, so that it doesn't oversubscribe the node.

Thanks for your time!

Best,
John





On Oct 16, 2012, at 2:12 PM, Jesse Lu <jesselu_at_stanford.edu> wrote:

> You can modify the all.q queue to assign a fixed number of slots to each node.
> If I remember correctly, "$ qconf -mq all.q" will bring up the configuration of the all.q queue in an editor.
> Under the "slots" attribute should be a semilengthly string such as "[node001=16],[node002=16],..."
> Try replacing the entire string with a single number such as "2". This should assign each host to have only two slots.
> Save the configuration and try a simple submission with the 'orte' parallel environment and let me know if it works.
> Jesse
>
> On Tue, Oct 16, 2012 at 1:37 PM, John St. John <johnthesaintjohn_at_gmail.com> wrote:
> Hello,
> I am having issues telling qsub to limit the number of jobs ran at any one time on each node of the cluster. There are sometimes ways to do this with things like "qsub -l node=1:ppn=1" or "qsub -l procs=2" or something. I even tried "qsub -l slots=2" but that gave me an error and told me to use the parallel environment. When I tried to use the "orte" parallel environment like "-pe orte 2" I see "slots=2" in my qstat list, but everything gets executed on one node at the same parallelization as before. How do I limit the number of jobs per node? I am running a process that consumes a very large amount of ram.
>
> Thanks,
> John
>
>
> _______________________________________________
> StarCluster mailing list
> StarCluster_at_mit.edu
> http://mailman.mit.edu/mailman/listinfo/starcluster
>
Received on Tue Oct 16 2012 - 18:32:55 EDT
This archive was generated by hypermail 2.3.0.

Search:

Sort all by:

Date

Month

Thread

Author

Subject