StarCluster - Mailing List Archive

Re: commlib error

From: Rajat Banerjee <no email>
Date: Tue, 23 Sep 2014 09:33:33 -0400

Hi Amanda,
It looks like you cannot communicate with the master node anymore. The
error message is because starcluster failed to execute a simple 'source
/etc/profile/' command with a 'connection refused' error.

Can you paste us the output of the following two commands:

> starcluster listclusters (should list status of all your active clusters
and running nodes)

> starcluster sshmaster <your cluster name> (i'm expecting this to fail)

Raj

On Mon, Sep 22, 2014 at 5:13 PM, Amanda Joy Kedaigle <mandyjoy_at_mit.edu>
wrote:

> Hi,
>
> I am trying to run starcluster's loadbalancer to keep only one node
> running until jobs are submitted to the cluster. I know it's an
> experimental feature, but I'm wondering if anyone has run into this error
> before, or has any suggestions. The cluster has been whittled down to 1
> node after a weekend of inactivity, and now it seems that when jobs are
> submitted to the queue, instead of adding nodes, SGE fails.
>
> >>> Loading full job history
> *** WARNING - Failed to retrieve stats (1/5):
> Traceback (most recent call last):
> File
> "/net/dorsal/apps/python2.7/lib/python2.7/site-packages/StarCluster-0.95.5-py2.7.egg/starcluster/balancers/sge/__init__.py",
> line 552, in get_stats
> return self._get_stats()
> File
> "/net/dorsal/apps/python2.7/lib/python2.7/site-packages/StarCluster-0.95.5-py2.7.egg/starcluster/balancers/sge/__init__.py",
> line 522, in _get_stats
> qhostxml = '\n'.join(master.ssh.execute('qhost -xml'))
> File
> "/net/dorsal/apps/python2.7/lib/python2.7/site-packages/StarCluster-0.95.5-py2.7.egg/starcluster/sshutils.py",
> line 578, in execute
> msg, command, exit_status, out_str)
> RemoteCommandFailed: remote command 'source /etc/profile && qhost -xml'
> failed with status 1:
> error: commlib error: got select error (Connection refused)
> error: unable to send message to qmaster using port 63231 on host
> "master": got send error
>
> Thanks for any help!
> Amanda
>
> _______________________________________________
> StarCluster mailing list
> StarCluster_at_mit.edu
> http://mailman.mit.edu/mailman/listinfo/starcluster
>
>
Received on Tue Sep 23 2014 - 09:33:55 EDT
This archive was generated by hypermail 2.3.0.

Search:

Sort all by:

Date

Month

Thread

Author

Subject