queue list doesn't match /etc/hosts?
Hi,
I recently noticed that my instance of starcluster stopped submitting jobs.
I had disabled jobs on the master node using `/opt/sge6/bin/linux-x64/qmod
-d all.q_at_master`, but the jobs were submitting to other nodes fine until
recently.
Interestingly, my output of `qstat -f`:
queuename qtype resv/used/tot. load_avg arch
states
---------------------------------------------------------------------------------
all.q_at_master BIP 0/0/8 1.01 linux-x64 d
So there's only one queue available (which is disabled). However, in
/etc/hosts, I see
10.0.0.85 master
10.0.0.80 node018
10.0.0.124 node025
10.0.0.139 node039
So for some reason, the queues for these other nodes aren't registered even
though the nodes exist and are associated with starcluster when I do, for
example, `starcluster listclusters`:
Cluster nodes:
master running i-3d8a8cc2 52.7.83.124
node018 running i-4331d5bd 52.0.84.150
node025 running i-e9e10717 52.6.226.185
node039 running i-e11afc1f 54.175.131.15
Total nodes: 4
I'm also running a load balancer on the cluster if that's relevant. Have
any of you seen this or know what might cause this?
Cheers,
David
Received on Fri Jun 26 2015 - 13:21:39 EDT
This archive was generated by
hypermail 2.3.0.