StarCluster - Mailing List Archive

StarCluster LoadBalancer

From: Sergio Mafra <no email>
Date: Sat, 9 Feb 2013 13:47:43 -0200

Hi fellows,

I have a cluster of 5 nodes (cc1.4xlarge) running two jobs. Each one with
40 nodes.
Im trying to use loadbalancer to kill the cluster after the jobs are done.
One strange thing is despite of the jobs are running in the queue, as you
can see here:

queuename qtype resv/used/tot. load_avg arch
 states
---------------------------------------------------------------------------------
all.q_at_master BIP 0/8/16 0.42 linux-x64
      2 0.55500 serra85 sgeadmin r 02/09/2013 11:52:17 8
---------------------------------------------------------------------------------
all.q_at_node001 BIP 0/8/1 -NA- linux-x64
auo
      2 0.55500 serra85 sgeadmin r 02/09/2013 11:52:17 8
---------------------------------------------------------------------------------
all.q_at_node002 BIP 0/8/1 -NA- linux-x64
auo
      2 0.55500 serra85 sgeadmin r 02/09/2013 11:52:17 8
---------------------------------------------------------------------------------
all.q_at_node003 BIP 0/8/1 -NA- linux-x64
auo
      2 0.55500 serra85 sgeadmin r 02/09/2013 11:52:17 8
---------------------------------------------------------------------------------
all.q_at_node004 BIP 0/8/1 -NA- linux-x64
auo
      2 0.55500 serra85 sgeadmin r 02/09/2013 11:52:17 8

If I issue this command in StarCluster: $ starcluster loadbalance newave -n
1

this is what Ive got:

>>> Loading full job history
Execution hosts: 5
Queued jobs: 0
Avg job duration: 0 secs
Avg job wait time: 0 secs
Last cluster modification time: 2013-02-09 15:32:21
>>> Not adding nodes: already at or above maximum (5)
>>> Looking for nodes to remove...
>>> No nodes can be removed at this time
>>> Sleeping...(looping again in 60 secs)

It seems that LoadBalancer didnt got the right Avg Job Duration and can
kill the cluster wrongly, even though that is jobs running.

All the best,

Sergio
Received on Sat Feb 09 2013 - 10:47:45 EST
This archive was generated by hypermail 2.3.0.

Search:

Sort all by:

Date

Month

Thread

Author

Subject