StarCluster - Mailing List Archive

Re: Is StarCluster still under active development?

From: Tony Robinson <no email>
Date: Fri, 1 Apr 2016 17:01:24 +0100

On 01/04/16 16:22, Rajat Banerjee wrote:
> Regarding:
> How about we just call qacct every 5 mins, or if the qacct buffer is
> empty.
> calling qacct and getting the job stats is the first part of the load
> balancers loop to see what the cluster is up to. I prioritized knowing
> the current state, and keeping the LB running it's loop as fast as
> possible (2-10 seconds), so it could run in a 1-minute loop and stay
> roughly on-schedule. It's easy to run the whole LB loop with 5 minutes
> between loops with the command line arg polling_interval, if that
> suits your workload better. I do not mean to sound dismissive, but the
> command line options (with reasonable defaults)are there so you can
> test and tweak to your work load.

Ah, I wasn't very clear. What I mean is that we only update the qacct
stats every 5 minutes. I run the main loop every 30s.

But calling qacct doesn't' take any time - we could do it every polling
interval:

root_at_master:~# date
Fri Apr 1 16:54:31 BST 2016
root_at_master:~# echo qacct -j -b `date +%y%m%d`$((`date +%H` - 3))`date +%m`
qacct -j -b 1604011304
root_at_master:~# time qacct -j -b `date +%y%m%d`$((`date +%H` - 3))`date
+%m` | wc
   99506 224476 3307423

real 0m0.588s
user 0m0.560s
sys 0m0.076s
root_at_master:~#


If calling qacct is slow then the update could be run at the end of the
loop so it would have all of the loop wait time to complete in.

> Regarding:
> Three sorts of jobs, all of which should occur in the same numbers,
> Have you tried testing your call to qacct to see if it's returning
> what you want? You could modify it in your source if it's not
> representative of your jobs:
> https://github.com/jtriley/StarCluster/blob/develop/starcluster/balancers/sge/__init__.py#L528
> qacct_cmd = 'qacct -j -b ' + qatime

Yes, thanks, I'm comparing to running qacct outside of the load balancer.

> Obviously one size doesn't fit all here, but if you find a set of args
> for qacct that work better for you, let me know.

At the moment I don't think that the output of qacct is used at all is
it? I thought it was only used to give job stats, I don't think it's
really used to bring nodes up/down.


Tony

-- 
Speechmatics is a trading name of Cantab Research Limited
We are hiring: www.speechmatics.com/careers 
<http:www.speechmatics.com/careers>
Dr A J Robinson, Founder, Cantab Research Ltd
Phone direct: 01223 794096, office: 01223 794497
Company reg no GB 05697423, VAT reg no 925606030
51 Canterbury Street, Cambridge, CB4 3QG, UK
Received on Fri Apr 01 2016 - 12:01:33 EDT
This archive was generated by hypermail 2.3.0.

Search:

Sort all by:

Date

Month

Thread

Author

Subject