StarCluster - Mailing List Archive

Re: loadbalance

From: Rajat Banerjee <no email>
Date: Fri, 20 Sep 2013 11:28:10 -0400

Hi Ryan,
Sorry wrong qacct command. I think i may know what's happening. Are your
jobs really long running? I think the 'lookback window' for checking the
job history may be too short for you? You could try it with being at least
twice the duration of one of your qsub'd tasks. See how every other line
says ">>> Loading full job history" That comes up because jobstats are
empty, 'qacct -j -b <some time' is coming back empty.

Trying to reproduce the behavior from:
https://github.com/jtriley/StarCluster/blob/develop/starcluster/balancers/sge/__init__.py#L504

Could you send the output from this:
make a date of when you started your cluster, approximately, in this
format:

MMDDhhmm Months, Days, hours, minutes

qacct -j -b <put that date format>

And please paste that qacct output here. That should always have a history
of all jobs. Then try the same with the date format being only 3 hours ago.
You can try toying with the lookback windows. The default is 3 hours and
you can feed a new one in on the command line:

*Lookback window* (-l LOOKBACK_WIN, –lookback_window=LOOKBACK_WIN) - How
long, in hours, to look back for past job history

Justin Riley, can you please update the doc on this site?
http://star.mit.edu/cluster/docs/0.93.3/manual/load_balancer.html

It says the window is in minutes but it's in fact in hours.

Thanks,
Raj
Received on Fri Sep 20 2013 - 11:28:33 EDT
This archive was generated by hypermail 2.3.0.

Search:

Sort all by:

Date

Month

Thread

Author

Subject