StarCluster - Mailing List Archive

Re: Using Array jobs in Starcluster

From: Rayson Ho <no email>
Date: Tue, 15 May 2012 10:50:17 -0400

Nathan,

Glad to know that it is working for you!! (Please cc the list as
Justin + many other helpful people are on the list.)

Back to your processor question - Yes, Open Grid Scheduler/Grid Engine
by default runs 1 job task per processor, and thus if you have 40
tasks in a job array then it will fully use all 40 processors.

However, in general you shouldn't just use n elements in an array job
for n nodes (unless you have many jobs). And if you are using spot
instances you might want to consider to have more elements in an array
job, but each element does less work. The reason is this:

Grid Engine treats each array task (ie. each element in the array job)
as an atomic entry, and thus if it needs to rerun a job, it reruns the
whole array task. So if almost at the end of the execution of an array
task EC2 decides to terminate the spot instance (the probability is
around 4% according to Amazon), then Grid Engine will need to rerun
the whole task.

There is always a trade-off between job rerun cost vs job scheduling
overhead. The best practice I found is to have each task to work for
at least 15 minutes, and leave the default schedule_interval as 15
seconds. Or you can increase schedule_interval and set
flush_finish_sec to a low but non-zero value (but each task should at
least run for around 5 minutes).

See the scheduler config manpage:
http://gridscheduler.sourceforge.net/htmlman/htmlman5/sched_conf.html

Rayson

================================
Open Grid Scheduler / Grid Engine
http://gridscheduler.sourceforge.net/

Scalable Grid Engine Support Program
http://www.scalablelogic.com/

On Tue, May 15, 2012 at 9:38 AM, Nathan C. Nelson
<Nathan.C.Nelson_at_hitchcock.org> wrote:
> Hi Rayson,
>
> Thank you for your reply.
>
> I don't understand it but after I did a few minor adjustments to my Geant4 program SimpleLinac, I successfully ran the command qsub -t 1-4 testrun.sh where my testrun.sh contains:
>        #!/bin/bash
>        #$ -S /bin/bash
>        cd /home/sgeadmin/SimpleLinac
>        source geant.sh
>        /bin/Linux-g++/SimpleLinac dose1d.mac $SGE_TASK_ID ./results/pdd$SGE_TASK_ID.txt ./results/prof$SGE_TASK_ID.txt
>
> And it worked this time!  My results directoy contains the files pdd1.txt,pdd2.txt,pdd3.txt,pdd4.txt,prof1.txt,prof2.txt,prof3.txt and prof4.txt.  Since I want to run SimpleLinac with 200 million events, I need to run the same SimpleLinac executable on n nodes where for each node I would run 200 million/n simulations using the command qsub -t 1-n testrun.sh.
> Anyhow, I was going to try to use spot pricing to reduce my costs in the future since now I'm just using Amazon's m1.small.  My next naïve question is if I launched my custom ami using an amazon instance with many processors, say 10 nodes which contain 4 processors/node, will my command qsub -t 1-40 testrun,sh run SimpleLinac on all 40 processors in my cluster?  Sorry for these basic questions but I'm trying to learn how to effectively use starcluster as economically as possible in my spare time!
>
> Sincerely,
>
> Nathan Nelson
> Email: Nathan.C.Nelson_at_Hitchcock.org
>
>
> -----Original Message-----
> From: Rayson Ho [mailto:raysonlogin_at_gmail.com]
> Sent: Monday, May 14, 2012 6:47 PM
> To: Nathan C. Nelson
> Cc: starcluster_at_mit.edu
> Subject: Re: [StarCluster] Using Array jobs in Starcluster
>
> So *specifically* what is not working?? Also, tell us how you want Grid Engine to do it for your use case...
>
> Rayson
>
> ================================
> Open Grid Scheduler / Grid Engine
> http://gridscheduler.sourceforge.net/
>
> Scalable Grid Engine Support Program
> http://www.scalablelogic.com/
>
>
>
> On Mon, May 14, 2012 at 10:50 AM, Nathan C. Nelson <Nathan.C.Nelson_at_hitchcock.org> wrote:
>> Hi,
>>
>>
>>
>> I'm trying to learn and use Starcluster to run multiple Geant4 Monte
>> Carlo simulations.  Specifically, I was trying to see if I could use
>> the Sun Grid Engine command qsub -t 1-# myshellscript where:
>>
>> Myshellscript contains:
>>
>>
>>
>> #!/bin/bash
>>
>> My_program $SGE_TASK_ID outputfile.$SGE_TASK_ID
>>
>>
>>
>> I wanted $SGE_TASK_ID to set the seed number for my simulations.  This
>> idea didn't seem to work for me so my question is how can I accomplish
>> this in the Starcluster environment?
>>
>>
>>
>> Sincerely,
>>
>> Chuck Nelson
>>
>> Clinical Physicist
>>
>> Norris Cotton Cancer Center
>>
>> Dartmouth-Hitchcock Medical Center
>>
>> One Medical Center Drive
>>
>> Lebanon, NH 03756-0001
>>
>> Tel. (603) 650-6487
>>
>> Email: Nathan.C.Nelson_at_Hitchcock.org
>>
>>
>>
>>
>>
>> IMPORTANT NOTICE REGARDING THIS ELECTRONIC MESSAGE:
>>
>> This message is intended for the use of the person to whom it is
>> addressed and may contain information that is privileged,
>> confidential, and protected from disclosure under applicable law. If
>> you are not the intended recipient, your use of this message for any
>> purpose is strictly prohibited. If you have received this
>> communication in error, please delete the message and notify the sender so that we may correct our records.
>>
>> _______________________________________________
>> StarCluster mailing list
>> StarCluster_at_mit.edu
>> http://mailman.mit.edu/mailman/listinfo/starcluster
>>
>
>
>
> --
> ==================================================
> Open Grid Scheduler - The Official Open Source Grid Engine http://gridscheduler.sourceforge.net/
>
>
> IMPORTANT NOTICE REGARDING THIS ELECTRONIC MESSAGE:
>
> This message is intended for the use of the person to whom it is addressed and may contain information that is privileged, confidential, and protected from disclosure under applicable law. If you are not the intended recipient, your use of this message for any purpose is strictly prohibited. If you have received this communication in error, please delete the message and notify the sender so that we may correct our records.



-- 
==================================================
Open Grid Scheduler - The Official Open Source Grid Engine
http://gridscheduler.sourceforge.net/
Received on Tue May 15 2012 - 10:50:19 EDT
This archive was generated by hypermail 2.3.0.

Search:

Sort all by:

Date

Month

Thread

Author

Subject