StarCluster - Mailing List Archive

Amazon EC2 bug?

From: François-Michel L'Heureux <no email>
Date: Mon, 15 Dec 2014 17:08:37 -0500

I want to share this with you.

Looking at a running StarCluster install, I saw a weird error fly by my
console.

ERROR - InvalidInstanceID.NotFound: The instance ID 'i-fc67821c' does not
exist.

I digged the instance ID and it was a valid instance, but from another
StarCluster installation. (We run more than one cluster.)

I then ran some grep calls over logs of the last 5 days and noticed a few
of those errors. It's always the same thing, a cluster adds some nodes and
for a short time window other clusters get those nodes via the "def nodes"
property of cluster.py
<https://github.com/jtriley/StarCluster/blob/50894f517837eb6b9a68f3e45ac7649e9c78c467/starcluster/cluster.py#L751>
.

My guess is that the security group filter is not always honoured for some
unknown reason. Amazon EC2 issue? Boto version issue? The former is more
likely since the error is transient.

Anyone having multiple cluster got that error?

In the meantime, I'll see if I can develop a patch to filter the results
and remove the nodes that shouldn't have been returned.

Cheers
Mich
Received on Mon Dec 15 2014 - 17:08:59 EST
This archive was generated by hypermail 2.3.0.

Search:

Sort all by:

Date

Month

Thread

Author

Subject