Hi fellows,
Started to test the SCīs LoadBalancer but something is not working well.
The LoadBalancer tells me that thereīs no jobs in the OGEīs queue.
Here comes all history:
1-Launched a 5-node cluster (Ubuntu HVM) cc1.x4large
-> Used mpich2 plugin (mpich2 v1.4.1 native)
2- Submitted an application job to OGE (mympiapp):
$ qsub -N Newaveinth -b y -pe orte 80 -cwd mpiexec -n 80 mympiapp
3- Checked the queue:
job-ID prior name user state submit/start at queue
slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
2 0.55500 Newaveinth sgeadmin r 01/23/2013 13:29:48
all.q_at_master 80
sgeadmin_at_master:~/pmo0113$ qstat -f
queuename qtype resv/used/tot. load_avg arch
states
---------------------------------------------------------------------------------
all.q_at_master BIP 0/16/16 3.06 linux-x64
2 0.55500 Newaveinth sgeadmin r 01/23/2013 13:29:48 16
---------------------------------------------------------------------------------
all.q_at_node001 BIP 0/16/16 2.87 linux-x64
2 0.55500 Newaveinth sgeadmin r 01/23/2013 13:29:48 16
---------------------------------------------------------------------------------
BIP 0/16/16 2.74 linux-x64
2 0.55500 Newaveinth sgeadmin r 01/23/2013 13:29:48 16
---------------------------------------------------------------------------------
all.q_at_node003 BIP 0/16/16 2.86 linux-x64
2 0.55500 Newaveinth sgeadmin r 01/23/2013 13:29:48 16
---------------------------------------------------------------------------------
all.q_at_node004 BIP 0/16/16 2.11 linux-x64
2 0.55500 Newaveinth sgeadmin r 01/23/2013 13:29:48 16
4- Tried to Load Balance the cluster launcedd with 5 nodes
ubuntu_at_ip-10-112-98-159:~$ starcluster loadbalance mycluster --max_nodes=6
StarCluster - (
http://web.mit.edu/starcluster) (v. 0.93.3)
Software Tools for Academics and Researchers (STAR)
Please submit bug reports to starcluster_at_mit.edu
>>> Starting load balancer (Use ctrl-c to exit)
Maximum cluster size: 6
Minimum cluster size: 1
Cluster growth rate: 1 nodes/iteration
>>> Loading full job history
Execution hosts: 5
Queued jobs: 0
Avg job duration: 1699 secs
Avg job wait time: 8 secs
Last cluster modification time: 2013-01-23 13:32:33
>>> Cluster was modified less than 180 seconds ago
>>> Waiting for cluster to stabilize...
>>> Sleeping...(looping again in 60 secs)
>>> Loading full job history
Execution hosts: 5
Queued jobs: 0 (<-- Jobs equals a 0 ???)
Avg job duration: 1699 secs
Avg job wait time: 8 secs
Last cluster modification time: 2013-01-23 13:32:33
>>> Cluster was modified less than 180 seconds ago
>>> Waiting for cluster to stabilize...
>>> Sleeping...(looping again in 60 secs)
^C (<-- Ctrl-C and a lot of messages...)
Traceback (most recent call last):
File "/usr/local/bin/starcluster", line 9, in <module>
load_entry_point('StarCluster==0.93.3', 'console_scripts',
'starcluster')()
File
"/usr/local/lib/python2.7/dist-packages/StarCluster-0.93.3-py2.7.egg/starcluster/cli.py",
line 312, in main
StarClusterCLI().main()
File
"/usr/local/lib/python2.7/dist-packages/StarCluster-0.93.3-py2.7.egg/starcluster/cli.py",
line 255, in main
sc.execute(args)
File
"/usr/local/lib/python2.7/dist-packages/StarCluster-0.93.3-py2.7.egg/starcluster/commands/loadbalance.py",
line 90, in execute
lb.run(cluster)
File
"/usr/local/lib/python2.7/dist-packages/StarCluster-0.93.3-py2.7.egg/starcluster/balancers/sge/__init__.py",
line 619, in run
time.sleep(self.polling_interval)
KeyboardInterrupt
Exception in thread Thread-1 (most likely raised during interpreter
shutdown):
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 552, in __bootstrap_inner
File
"/usr/local/lib/python2.7/dist-packages/ssh-1.7.13-py2.7.egg/ssh/transport.py",
line 1602, in run
<type 'exceptions.AttributeError'>: 'NoneType' object has no attribute
'error'_at_node002
All the best,
Sergio
Received on Wed Jan 23 2013 - 08:39:58 EST