[Star cluster] error tolerance design when adding nodes
This archive was generated by
For an example, I just found it is not uncommon to have one or two
instances not communicable after you adding 50 instances in the cluster.
The progress bar got stuck when waiting for ssh. And I have to manually
restart those problematic instances.
I have not yet went through the codes of starcluster, I wonder if
StarCluster already has some error tolerance design for these situation?
Received on Sun Jul 20 2014 - 15:08:01 EDT