Hi,
I've been using the loadbalancer on a small cluster (up to 5 execute
nodes + the master). The nodes are c3.8xlarge. It seems to spin nodes
up and configure SGE OK, but upon automatically removing nodes when the
load goes down it's not working properly.
All of the nodes were removed from SGE as execute nodes. However, all
of the nodes were left running. In addition, if I try to manually do a
removenode it generates errors. I then had to forcibly remove the nodes
with removenode -f.
starcluster --version
StarCluster - (
http://star.mit.edu/cluster) (v. 0.95.6)
Software Tools for Academics and Researchers (STAR)
Please submit bug reports to starcluster_at_mit.edu
0.95.6
The master node is running:
lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 12.04.2 LTS
Release: 12.04
Codename: precise
Unfortunately, it looks like my debug logs have been rotated and so I
don't have a log at the time the problem happened. Has anyone else run
into this? If so, do you know what's causing this? And how to avoid it?
Thanks,
Herc
Received on Thu Nov 19 2015 - 13:06:08 EST