StarCluster - Mailing List Archive

Re: Is StarCluster still under active development?

From: Tony Robinson <no email>
Date: Fri, 1 Apr 2016 17:01:24 +0100

On 01/04/16 16:22, Rajat Banerjee wrote:
> Regarding:
> How about we just call qacct every 5 mins, or if the qacct buffer is
> empty.
> calling qacct and getting the job stats is the first part of the load
> balancers loop to see what the cluster is up to. I prioritized knowing
> the current state, and keeping the LB running it's loop as fast as
> possible (2-10 seconds), so it could run in a 1-minute loop and stay
> roughly on-schedule. It's easy to run the whole LB loop with 5 minutes
> between loops with the command line arg polling_interval, if that
> suits your workload better. I do not mean to sound dismissive, but the
> command line options (with reasonable defaults)are there so you can
> test and tweak to your work load.

Ah, I wasn't very clear. What I mean is that we only update the qacct
stats every 5 minutes. I run the main loop every 30s.

But calling qacct doesn't' take any time - we could do it every polling

root_at_master:~# date
Fri Apr 1 16:54:31 BST 2016
root_at_master:~# echo qacct -j -b `date +%y%m%d`$((`date +%H` - 3))`date +%m`
qacct -j -b 1604011304
root_at_master:~# time qacct -j -b `date +%y%m%d`$((`date +%H` - 3))`date
+%m` | wc
   99506 224476 3307423

real 0m0.588s
user 0m0.560s
sys 0m0.076s

If calling qacct is slow then the update could be run at the end of the
loop so it would have all of the loop wait time to complete in.

> Regarding:
> Three sorts of jobs, all of which should occur in the same numbers,
> Have you tried testing your call to qacct to see if it's returning
> what you want? You could modify it in your source if it's not
> representative of your jobs:
> qacct_cmd = 'qacct -j -b ' + qatime

Yes, thanks, I'm comparing to running qacct outside of the load balancer.

> Obviously one size doesn't fit all here, but if you find a set of args
> for qacct that work better for you, let me know.

At the moment I don't think that the output of qacct is used at all is
it? I thought it was only used to give job stats, I don't think it's
really used to bring nodes up/down.


Speechmatics is a trading name of Cantab Research Limited
We are hiring: 
Dr A J Robinson, Founder, Cantab Research Ltd
Phone direct: 01223 794096, office: 01223 794497
Company reg no GB 05697423, VAT reg no 925606030
51 Canterbury Street, Cambridge, CB4 3QG, UK
Received on Fri Apr 01 2016 - 12:01:33 EDT
This archive was generated by hypermail 2.3.0.


Sort all by: