StarCluster - Mailing List Archive

Re: error of loadbalance ( can not list current job )

From: Kai Li <no email>
Date: Wed, 27 Feb 2013 00:44:04 +0100

Hi Ron,

Thanks for your help! You are right. I finished one job. And the file
/opt/sge6/default/common/accounting was created.



On Mon, Feb 25, 2013 at 12:04 AM, Ron Chen <ron_chen_123_at_yahoo.com> wrote:

> What is the outout of qstat and qacct without any arguments to those
> commands? And did your cluster finish running any jobs?
>
> The file /opt/sge6/default/common/accounting is only there if there were
> jobs finished running.
>
> -Ron
>
> ************************************************************************
> Open Grid Scheduler - the official open source Grid Engine:
> http://gridscheduler.sourceforge.net/
>
>
>
> ________________________________
> From: Kai Li <kai.li.jx_at_gmail.com>
> To: starcluster_at_mit.edu
> Sent: Saturday, February 23, 2013 7:32 PM
> Subject: [StarCluster] error of loadbalance ( can not list current job )
>
>
> Hi,
>
> When I use Starcluster, I got the following error message when I tried to
> use "starcluster loadbalance"
>
>
> >>> Loading full job history
> *** WARNING - Failed to retrieve stats (5/5):
> Traceback (most recent call last):
> File
> "/home/kli/.local/lib/python2.7/site-packages/StarCluster-0.9999-py2.7.egg/starcluster/balancers/sge/__init__.py",
> line 515, in get_stats
> self.stat = self._get_stats()
> File
> "/home/kli/.local/lib/python2.7/site-packages/StarCluster-0.9999-py2.7.egg/starcluster/balancers/sge/__init__.py",
> line 493, in _get_stats
> qacct = '\n'.join(master.ssh.execute(qacct_cmd))
> File
> "/home/kli/.local/lib/python2.7/site-packages/StarCluster-0.9999-py2.7.egg/starcluster/sshutils/__init__.py",
> line 538, in execute
> msg, command, exit_status, out_str)
> RemoteCommandFailed: remote command 'source /etc/profile && qacct -j -b
> 201302232051' failed with status 1:
> no jobs running since startup
> /opt/sge6/default/common/accounting: No such file or directory
> *** WARNING - Retrying in 60s
> !!! ERROR - Failed to retrieve SGE stats after trying 5 times,
> !!! ERROR - exiting...
>
>
>
> And I've tried qacct -j -b 201302232046 on masternode and also got the
> error message of "/opt/sge6/default/common/accounting: No such file or
> directory"Can anyone give me some hint to fix it? Thanks!
>
> --
> 李凯 ( Kai Li )
> _______________________________________________
> StarCluster mailing list
> StarCluster_at_mit.edu
> http://mailman.mit.edu/mailman/listinfo/starcluster
>



-- 
李凯 ( Kai Li )
Received on Tue Feb 26 2013 - 18:44:05 EST
This archive was generated by hypermail 2.3.0.

Search:

Sort all by:

Date

Month

Thread

Author

Subject