Restarting bad nodes

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

From: Suchindra Sandhu <no email>
Date: Mon, 24 Jun 2013 09:36:54 +0200

Over the last week, I frequently ran into errors while
using the addnode command to add nodes to an existing cluster. While
most of the time, it works out of the box, sometimes due to what I am
guessing are ec2 stability issues, I get nfs or ssh related
errors. Unfortunately, since the errors happen at the configuration
stage, it leaves some nodes just hanging.

Is there any way to restart just the bad nodes in a cluster? While I
can manually terminate them from the aws console and restart the
cluster, that takes a lot of time for relatively large cluster
sizes. Also sometimes I do not want to interrupt the computation on
the existing nodes and hence restarting all the nodes does not seem
like a great option.

I would appreciate suggestions/tricks/workarounds to deal with this
issue.

Thanks!

Suchindra
Received on Mon Jun 24 2013 - 03:36:56 EDT

This message: [ Message body ]
Next message: Izhar Wallach: "Can't start a cluster in a region other than the default (N.Virginia)."
Previous message: Robert Yu: "Re: How to stop the nodes but leave the master running?"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

This archive was generated by hypermail 2.3.0.

Restarting bad nodes

Search:

Sort all by:

Navigation

Restarting bad nodes

Search:

Sort all by:

Navigation