StarCluster - Mailing List Archive

queue list doesn't match /etc/hosts?

From: David Koppstein <no email>
Date: Fri, 26 Jun 2015 17:20:41 +0000

Hi,

I recently noticed that my instance of starcluster stopped submitting jobs.
I had disabled jobs on the master node using `/opt/sge6/bin/linux-x64/qmod
-d all.q_at_master`, but the jobs were submitting to other nodes fine until
recently.

Interestingly, my output of `qstat -f`:

queuename qtype resv/used/tot. load_avg arch
 states
---------------------------------------------------------------------------------
all.q_at_master BIP 0/0/8 1.01 linux-x64 d

So there's only one queue available (which is disabled). However, in
/etc/hosts, I see

10.0.0.85 master
10.0.0.80 node018
10.0.0.124 node025
10.0.0.139 node039

So for some reason, the queues for these other nodes aren't registered even
though the nodes exist and are associated with starcluster when I do, for
example, `starcluster listclusters`:

Cluster nodes:
     master running i-3d8a8cc2 52.7.83.124
    node018 running i-4331d5bd 52.0.84.150
    node025 running i-e9e10717 52.6.226.185
    node039 running i-e11afc1f 54.175.131.15
Total nodes: 4

I'm also running a load balancer on the cluster if that's relevant. Have
any of you seen this or know what might cause this?

Cheers,
David
Received on Fri Jun 26 2015 - 13:21:39 EDT
This archive was generated by hypermail 2.3.0.

Search:

Sort all by:

Date

Month

Thread

Author

Subject