Hi Ryan,
Sorry wrong qacct command. I think i may know what's happening. Are your
jobs really long running? I think the 'lookback window' for checking the
job history may be too short for you? You could try it with being at least
twice the duration of one of your qsub'd tasks. See how every other line
says ">>> Loading full job history" That comes up because jobstats are
empty, 'qacct -j -b <some time' is coming back empty.
Trying to reproduce the behavior from:
https://github.com/jtriley/StarCluster/blob/develop/starcluster/balancers/sge/__init__.py#L504
Could you send the output from this:
make a date of when you started your cluster, approximately, in this
format:
MMDDhhmm Months, Days, hours, minutes
qacct -j -b <put that date format>
And please paste that qacct output here. That should always have a history
of all jobs. Then try the same with the date format being only 3 hours ago.
You can try toying with the lookback windows. The default is 3 hours and
you can feed a new one in on the command line:
*Lookback window* (-l LOOKBACK_WIN, –lookback_window=LOOKBACK_WIN) - How
long, in hours, to look back for past job history
Justin Riley, can you please update the doc on this site?
http://star.mit.edu/cluster/docs/0.93.3/manual/load_balancer.html
It says the window is in minutes but it's in fact in hours.
Thanks,
Raj
Received on Fri Sep 20 2013 - 11:28:33 EDT