StarCluster - Mailing List Archive

error of loadbalance ( can not list current job )

From: Kai Li <no email>
Date: Sun, 24 Feb 2013 01:32:00 +0100

Hi,

When I use Starcluster, I got the following error message when I tried to
use "starcluster loadbalance"

>>> Loading full job history
*** WARNING - Failed to retrieve stats (5/5):
Traceback (most recent call last):
  File
"/home/kli/.local/lib/python2.7/site-packages/StarCluster-0.9999-py2.7.egg/starcluster/balancers/sge/__init__.py",
line 515, in get_stats
    self.stat = self._get_stats()
  File
"/home/kli/.local/lib/python2.7/site-packages/StarCluster-0.9999-py2.7.egg/starcluster/balancers/sge/__init__.py",
line 493, in _get_stats
    qacct = '\n'.join(master.ssh.execute(qacct_cmd))
  File
"/home/kli/.local/lib/python2.7/site-packages/StarCluster-0.9999-py2.7.egg/starcluster/sshutils/__init__.py",
line 538, in execute
    msg, command, exit_status, out_str)
RemoteCommandFailed: remote command 'source /etc/profile && qacct -j -b
201302232051' failed with status 1:
no jobs running since startup
/opt/sge6/default/common/accounting: No such file or directory
*** WARNING - Retrying in 60s
!!! ERROR - Failed to retrieve SGE stats after trying 5 times,
!!! ERROR - exiting...


And I've tried qacct -j -b 201302232046 on masternode and also got the
error message of "/opt/sge6/default/common/accounting: No such file or
directory"
Can anyone give me some hint to fix it? Thanks!

-- 
李凯 ( Kai Li )
Received on Sat Feb 23 2013 - 19:32:01 EST
This archive was generated by hypermail 2.3.0.

Search:

Sort all by:

Date

Month

Thread

Author

Subject