StarCluster - Mailing List Archive

Re: loadbalance

From: Ryan Golhar <no email>
Date: Tue, 29 Oct 2013 22:07:27 -0400

As a follow up:

>>> Loading full job history
Execution hosts: 41
Queued jobs: 165
Oldest queued job: 2013-10-30 01:15:34
Avg job duration: 1541 secs
Avg job wait time: 992 secs
Last cluster modification time: 2013-10-30 01:17:05
>>> Not adding nodes: already at or above maximum (1)
>>> Sleeping...(looping again in 60 secs)

Execution hosts: 41
Queued jobs: 161
Oldest queued job: 2013-10-30 01:15:34
Avg job duration: 0 secs
Avg job wait time: 0 secs
Last cluster modification time: 2013-10-30 01:17:05
>>> Not adding nodes: already at or above maximum (1)
>>> Sleeping...(looping again in 60 secs)




On Tue, Oct 29, 2013 at 10:04 PM, Ryan Golhar
<ngsbioinformatics_at_gmail.com>wrote:

> Hi Rajat,
>
> Its happening again. My jobs are, on average, 1hr long. I'm attaching
> the qacct output as an attachment:
>
> qacct -j -b "10291300" > qacct.out
>
>
>
>
> On Fri, Sep 20, 2013 at 11:28 AM, Rajat Banerjee <rajatb_at_post.harvard.edu>wrote:
>
>> Hi Ryan,
>> Sorry wrong qacct command. I think i may know what's happening. Are your
>> jobs really long running? I think the 'lookback window' for checking the
>> job history may be too short for you? You could try it with being at least
>> twice the duration of one of your qsub'd tasks. See how every other line
>> says ">>> Loading full job history" That comes up because jobstats are
>> empty, 'qacct -j -b <some time' is coming back empty.
>>
>> Trying to reproduce the behavior from:
>>
>> https://github.com/jtriley/StarCluster/blob/develop/starcluster/balancers/sge/__init__.py#L504
>>
>> Could you send the output from this:
>> make a date of when you started your cluster, approximately, in this
>> format:
>>
>> MMDDhhmm Months, Days, hours, minutes
>>
>> qacct -j -b <put that date format>
>>
>> And please paste that qacct output here. That should always have a
>> history of all jobs. Then try the same with the date format being only 3
>> hours ago. You can try toying with the lookback windows. The default is 3
>> hours and you can feed a new one in on the command line:
>>
>> *Lookback window* (-l LOOKBACK_WIN, –lookback_window=LOOKBACK_WIN) - How
>> long, in hours, to look back for past job history
>>
>> Justin Riley, can you please update the doc on this site?
>> http://star.mit.edu/cluster/docs/0.93.3/manual/load_balancer.html
>>
>> It says the window is in minutes but it's in fact in hours.
>>
>> Thanks,
>> Raj
>>
>
>
Received on Tue Oct 29 2013 - 22:07:28 EDT
This archive was generated by hypermail 2.3.0.

Search:

Sort all by:

Date

Month

Thread

Author

Subject