The main issue that I have with the load balancer is sometimes bringing
up a node or taking down a node fails and this caused the loadbalancer
to fall over. This is almost certainly an issue with boto - I just
haven't looked into it enough.
I'm working on the loadbalancer right now. I'm running a few different
sorts of jobs, some take half a minute some take five minutes. It
takes me about five minutes to bring a node up, so load balancing is
quite a hard task, certainly what's there at the moment isn't optimal.
In your masters thesis you had a go at anticipating the future load
based on the queue, although I see no trace of this in the current
code. What seems like the most obvious approach to me is to look at
what's running and in the queue and see if it's all going to complete
within some specified period. If it is, then fine, if not assume you
are going to bring n nodes up (start at n=1) and then see if it'll
complete, if not then increment n.
I've got a version of this running but it isn't completed because
avg_job_duration() consistently under reports. I'm doing some
debugging, it seems that jobstats has a bug, I have three type of job,
a start, middle and end, and as they are all run in sequence then
jobstats should have equal numbers of each. It doesn't.
This is a weekend (with unreliable time) activity for me. If you or
anyone else wants to help:
a) getting avg_job_duration() working which probably means fixing
b) getting a clean simple predictive load balancer working
then please contact me.
On 25/03/16 17:17, Rajat Banerjee wrote:
> I'll fix any issues with the load balancer if they come up.
Speechmatics is a trading name of Cantab Research Limited
We are hiring: www.speechmatics.com/careers
Dr A J Robinson, Founder, Cantab Research Ltd
Phone direct: 01223 794096, office: 01223 794497
Company reg no GB 05697423, VAT reg no 925606030
51 Canterbury Street, Cambridge, CB4 3QG, UK
Received on Fri Mar 25 2016 - 15:56:32 EDT