StarCluster - Mailing List Archive

StarCluster LoadBalancer

From: Sergio Mafra <no email>
Date: Wed, 23 Jan 2013 11:39:56 -0200

Hi fellows,

Started to test the SCīs LoadBalancer but something is not working well.
The LoadBalancer tells me that thereīs no jobs in the OGEīs queue.
Here comes all history:

1-Launched a 5-node cluster (Ubuntu HVM) cc1.x4large
-> Used mpich2 plugin (mpich2 v1.4.1 native)

2- Submitted an application job to OGE (mympiapp):
$ qsub -N Newaveinth -b y -pe orte 80 -cwd mpiexec -n 80 mympiapp

3- Checked the queue:

job-ID prior name user state submit/start at queue
                     slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
      2 0.55500 Newaveinth sgeadmin r 01/23/2013 13:29:48
all.q_at_master 80
sgeadmin_at_master:~/pmo0113$ qstat -f
queuename qtype resv/used/tot. load_avg arch
 states
---------------------------------------------------------------------------------
all.q_at_master BIP 0/16/16 3.06 linux-x64
      2 0.55500 Newaveinth sgeadmin r 01/23/2013 13:29:48 16
---------------------------------------------------------------------------------
all.q_at_node001 BIP 0/16/16 2.87 linux-x64
      2 0.55500 Newaveinth sgeadmin r 01/23/2013 13:29:48 16
---------------------------------------------------------------------------------
            BIP 0/16/16 2.74 linux-x64
      2 0.55500 Newaveinth sgeadmin r 01/23/2013 13:29:48 16
---------------------------------------------------------------------------------
all.q_at_node003 BIP 0/16/16 2.86 linux-x64
      2 0.55500 Newaveinth sgeadmin r 01/23/2013 13:29:48 16
---------------------------------------------------------------------------------
all.q_at_node004 BIP 0/16/16 2.11 linux-x64
      2 0.55500 Newaveinth sgeadmin r 01/23/2013 13:29:48 16

4- Tried to Load Balance the cluster launcedd with 5 nodes

ubuntu_at_ip-10-112-98-159:~$ starcluster loadbalance mycluster --max_nodes=6
StarCluster - (http://web.mit.edu/starcluster) (v. 0.93.3)
Software Tools for Academics and Researchers (STAR)
Please submit bug reports to starcluster_at_mit.edu

>>> Starting load balancer (Use ctrl-c to exit)
Maximum cluster size: 6
Minimum cluster size: 1
Cluster growth rate: 1 nodes/iteration

>>> Loading full job history
Execution hosts: 5
Queued jobs: 0
Avg job duration: 1699 secs
Avg job wait time: 8 secs
Last cluster modification time: 2013-01-23 13:32:33
>>> Cluster was modified less than 180 seconds ago
>>> Waiting for cluster to stabilize...
>>> Sleeping...(looping again in 60 secs)

>>> Loading full job history
Execution hosts: 5
Queued jobs: 0 (<-- Jobs equals a 0 ???)
Avg job duration: 1699 secs
Avg job wait time: 8 secs
Last cluster modification time: 2013-01-23 13:32:33
>>> Cluster was modified less than 180 seconds ago
>>> Waiting for cluster to stabilize...
>>> Sleeping...(looping again in 60 secs)

^C (<-- Ctrl-C and a lot of messages...)
Traceback (most recent call last):
  File "/usr/local/bin/starcluster", line 9, in <module>
    load_entry_point('StarCluster==0.93.3', 'console_scripts',
'starcluster')()
  File
"/usr/local/lib/python2.7/dist-packages/StarCluster-0.93.3-py2.7.egg/starcluster/cli.py",
line 312, in main
    StarClusterCLI().main()
  File
"/usr/local/lib/python2.7/dist-packages/StarCluster-0.93.3-py2.7.egg/starcluster/cli.py",
line 255, in main
    sc.execute(args)
  File
"/usr/local/lib/python2.7/dist-packages/StarCluster-0.93.3-py2.7.egg/starcluster/commands/loadbalance.py",
line 90, in execute
    lb.run(cluster)
  File
"/usr/local/lib/python2.7/dist-packages/StarCluster-0.93.3-py2.7.egg/starcluster/balancers/sge/__init__.py",
line 619, in run
    time.sleep(self.polling_interval)
KeyboardInterrupt
Exception in thread Thread-1 (most likely raised during interpreter
shutdown):
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 552, in __bootstrap_inner
  File
"/usr/local/lib/python2.7/dist-packages/ssh-1.7.13-py2.7.egg/ssh/transport.py",
line 1602, in run
<type 'exceptions.AttributeError'>: 'NoneType' object has no attribute
'error'_at_node002

All the best,

Sergio
Received on Wed Jan 23 2013 - 08:39:58 EST
This archive was generated by hypermail 2.3.0.

Search:

Sort all by:

Date

Month

Thread

Author

Subject