StarCluster - Mailing List Archive

Re: minimal cost with loadbalance

From: MacMullan, Hugh <no email>
Date: Thu, 29 May 2014 14:29:02 +0000

You can use StephansBlog method as well, maybe an easier plugin than seq#?:

http://wiki.gridengine.info/wiki/index.php/StephansBlog

To proof-of-concept, I did NOT create a new plugin, but modified the sge plugin instead (the sge.py template and sge.py plugin code) -- so probably not a great solution in the long run -- but it works as expected. Feel free to create your own plugin from these mods? It would be cool if this was in starcluster already (or the seq# bit), so that users only need to modify their scheduler config to force this ‘fill up’ behavior.

$ diff templates/sge.py.dist templates/sge.py
88a89,100
>
> sge_exec_template = """
> hostname %s
> load_scaling NONE
> complex_values slots=%s
> user_lists NONE
> xuser_lists NONE
> projects NONE
> xprojects NONE
> usage_scaling NONE
> report_variables NONE
> """
$ diff plugins/sge.py.dist plugins/sge.py
106a107,111
> master = self._master
> execconf = master.ssh.remote_file("/tmp/execconf.txt", "w")
> execconf.write(sge.sge_exec_template % (node.alias, num_slots))
> execconf.close()
> master.ssh.execute('qconf -Me %s' % execconf.name)

For it to work, SGE needs scheduler conf adjusted as well (qconf -msconf), didn’t do that in StarCluster, as this is just a proof-of-concept and the master stays up anyway:

algorithm default
schedule_interval 0:2:0
maxujobs 0
queue_sort_method load
job_load_adjustments NONE
load_adjustment_decay_time 0:0:0
load_formula slots
schedd_job_info true
flush_submit_sec 1
flush_finish_sec 1

Cheers,
-Hugh


From: starcluster-bounces_at_mit.edu [mailto:starcluster-bounces_at_mit.edu] On Behalf Of Rayson Ho
Sent: Thursday, May 29, 2014 7:54 AM
To: David Mrva
Cc: starcluster_at_mit.edu
Subject: Re: [StarCluster] minimal cost with loadbalance

You can set the Grid Engine "queue_sort_method" parameter to "seq_no" in sched_conf:

http://gridscheduler.sourceforge.net/htmlman/htmlman5/sched_conf.html
And for this to work, we need each instance to have a different "seq_no", so a small StarCluster plugin will need to be developed -- ie. the plugin will assign a new seq_no when an instance gets created.

Rayson

==================================================
Open Grid Scheduler - The Official Open Source Grid Engine
http://gridscheduler.sourceforge.net/
http://gridscheduler.sourceforge.net/GridEngine/GridEngineCloud.html

On Thu, May 29, 2014 at 3:10 AM, David Mrva <davidm_at_cantabresearch.com<mailto:davidm_at_cantabresearch.com>> wrote:
Hello,

I stared using StarCluster with Amazon spot instances. I expect that the
workload of my application will fluctuate a lot and I aim to minimise
the cost of running the spot instances. StarCluster's loadbalancer seems
to go some way in this direction. It adds more spot instances when the
SGE queue is busy and removes unused nodes. The removal of the nodes
interacts with SGE's strategy for assigning jobs to queues. SGE chooses
the node with the lowest load average to assign a job to. If there are
more nodes in the cluster than are necessary to execute the jobs, this
strategy will result in spreading the jobs that need to be executed
across as many nodes as possible. This behaviour reduces the chances of
some of the nodes staying unused and potentially being removed by the
load balancer.

I'd like to configure StarCluster in such a way that SGE jobs go to node
A for as long as there are slots available on it and they go to node B
only if there is no vacant slot on node A. For example, on a cluster
with nodes A and B and 8 slots on each node if there are 4 slots being
used on node A and 4 more jobs arrive to SGE, I'd like all 4 of these
new jobs to go node A. Using the "orte" parallel environment with
"fill_up" allocation strategy does not achieve this. For the above
example, using the "fill_up" allocation strategy will pick node B
(lowest load average node) and assign all 4 new jobs to it, resulting in
nodes A and B running 4 jobs each instead of A running 8 jobs and B none.

How can I use StarCluster's built-in load balancer to minimise the cost
of running spot instances by minimising the number unused CPUs in the
way described above?

Many thanks,
David
_______________________________________________
StarCluster mailing list
StarCluster_at_mit.edu<mailto:StarCluster_at_mit.edu>
http://mailman.mit.edu/mailman/listinfo/starcluster

Received on Thu May 29 2014 - 10:29:06 EDT
This archive was generated by hypermail 2.3.0.

Search:

Sort all by:

Date

Month

Thread

Author

Subject