StarCluster - Mailing List Archive

Re: Delay when using Sun Grid Engine

From: Justin Riley <no email>
Date: Wed, 17 Oct 2012 14:00:49 -0400

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Jesse/Rayson,

Sorry for my absence on this. The latest version of OGS is included in
the up and coming 12.04 AMIs. I'm finishing up some testing of the
12.04 AMIs and will release them soon. I'm happy to say that
ge2011.11u1p1 works great.

Also it's useful to know about the load_report_time variable given
that I've also experienced the exact delay in terms of reporting a PE
job as finished. I'll likely tweak this in the default StarCluster SGE
setup.

~Justin

On 09/05/2012 07:32 PM, Jesse Lu wrote:
> Hi Rayson,
>
> Let me first say thanks for OGS, its a super useful tool!
>
> So, an update.... I realized that the parameter was
> load_report_time in the global configuration. The delay was
> basically exactly load_report_time, and so I have set it to 0, and
> the delay is basically gone...
>
> Rayson, here is my global configuration (qconf -sconf), any
> comments? Particularly, is it okay to have a value of zero for
> load_report_time?
>
> $ qconf -sconf #global: execd_spool_dir
> /opt/sge6/default/spool mailer /bin/mail
> xterm /usr/bin/X11/xterm load_sensor
> none prolog none epilog
> none shell_start_mode posix_compliant login_shells
> sh,bash,ksh,csh,tcsh min_uid 0 min_gid
> 0 user_lists none xuser_lists
> none projects none xprojects
> none enforce_project false enforce_user
> auto load_report_time 00:00:00 max_unheard
> 00:05:00 reschedule_unknown 02:00:00 loglevel
> log_warning administrator_mail none_at_none.edu
> <mailto:none_at_none.edu> set_token_cmd none pag_cmd
> none token_extend_time none shepherd_cmd
> none qmaster_params none execd_params
> none reporting_params accounting=false reporting=false
> \ flush_time=00:00:15 joblog=false sharelog=00:00:00 finished_jobs
> 100 gid_range 20000-20100 qlogin_command
> builtin qlogin_daemon builtin rlogin_command
> builtin rlogin_daemon builtin rsh_command
> builtin rsh_daemon builtin max_aj_instances
> 2000 max_aj_tasks 75000 max_u_jobs
> 0 max_jobs 0 max_advance_reservations 0
> auto_user_oticket 0 auto_user_fshare 0
> auto_user_default_project none auto_user_delete_time
> 86400 delegated_file_staging false reprioritize
> 0 jsv_url none jsv_allowed_mod
> ac,h,i,e,o,j,M,N,p,w
>
>
> On Wed, Sep 5, 2012 at 12:52 PM, Rayson Ho <raysonlogin_at_gmail.com
> <mailto:raysonlogin_at_gmail.com>> wrote:
>
> On Wed, Sep 5, 2012 at 1:10 PM, Jesse Lu <jesselu_at_stanford.edu
> <mailto:jesselu_at_stanford.edu>> wrote:
>> However, if I run in a parallel environment (e.g. qsub -pe orte
> ...) then
>> there is an approximately 40 sec delay after job completion.
>> That
> is to say,
>> the job has technically finished, although qstat still lists it
>> as
> running,
>> and subsequent jobs are held up. Any ideas?
>
> That's fixed in the update release.
>
> Rayson
>
> ================================================== Open Grid
> Scheduler - The Official Open Source Grid Engine
> http://gridscheduler.sourceforge.net/
>
>
>>
>> Thanks in advance!
>>
>>
>> On Tue, Sep 4, 2012 at 5:33 PM, Rayson Ho <raysonlogin_at_gmail.com
> <mailto:raysonlogin_at_gmail.com>> wrote:
>>>
>>> That's the default scheduling time, and if you really want the
>>> scheduler to react to your qsub requests ASAP, you can turn on
>>> "scheduling-on-demand":
>>>
>>> http://gridscheduler.sourceforge.net/howto/tuning.html
>>>
>>> And in OGS/GE 2011.11 u1 p1 (we need a better name), the time
>>> it
> takes
>>> to report job done should be reduced.
>>>
>>> Rayson
>>>
>>> ================================================== Open Grid
>>> Scheduler - The Official Open Source Grid Engine
>>> http://gridscheduler.sourceforge.net/
>>>
>>>
>>>
>>> On Tue, Sep 4, 2012 at 8:05 PM, Jesse Lu <mr.jesselu_at_gmail.com
> <mailto:mr.jesselu_at_gmail.com>> wrote:
>>>> Yes! Exactly.
>>>>
>>>> -- Jesse ________________________________ On Sep 4, 2012 4:19
>>>> PM, Rayson Ho <raysonlogin_at_gmail.com
> <mailto:raysonlogin_at_gmail.com>> wrote:
>>>>
>>>> Hi Jesse,
>>>>
>>>> Are you referring to the scheduling time of Grid Engine??
>>>>
>>>> Rayson
>>>>
>>>> ================================================== Open Grid
>>>> Scheduler - The Official Open Source Grid Engine
>>>> http://gridscheduler.sourceforge.net/
>>>>
>>>>
>>>> On Tue, Sep 4, 2012 at 6:37 PM, Jesse Lu
>>>> <jesselu_at_stanford.edu
> <mailto:jesselu_at_stanford.edu>> wrote:
>>>>> Hi StarCluster users,
>>>>>
>>>>> I've noticed long delays with Sun Grid Engine when
>>>>> submitting
> jobs and
>>>>> especially after job execution. Even running a simple
> "hostname" job
>>>>> takes several seconds. Moreover, running an MPI version of
> "hostname" can
>>>>> take 2 minutes!!
>>>>>
>>>>> Can someone help me get rid of this delay? Thank you.
>>>>>
>>>>> Jesse
>>>>>
>>>>> _______________________________________________ StarCluster
>>>>> mailing list StarCluster_at_mit.edu
>>>>> <mailto:StarCluster_at_mit.edu>
>>>>> http://mailman.mit.edu/mailman/listinfo/starcluster
>>>>>
>>
>>
>
>
>
>
> _______________________________________________ StarCluster mailing
> list StarCluster_at_mit.edu
> http://mailman.mit.edu/mailman/listinfo/starcluster
>

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://www.enigmail.net/

iEYEARECAAYFAlB+8lEACgkQ4llAkMfDcrnyGwCeLtu7X6gljri93H2XHsQVI8HM
0Q4AnAq/tuq9H+2mENE2ZtgzqdlXxS1U
=bn9p
-----END PGP SIGNATURE-----
Received on Wed Oct 17 2012 - 14:00:53 EDT
This archive was generated by hypermail 2.3.0.

Search:

Sort all by:

Date

Month

Thread

Author

Subject