StarCluster - Mailing List Archive

Re: StarCluster Digest, Vol 22, Issue 1

From: Justin Riley <no email>
Date: Thu, 04 Aug 2011 13:03:57 -0400

Hash: SHA1

Hi Don,

On 07/27/2011 01:56 AM, Don MacMillen wrote:
> Yes, I thought it must be an 'eventual consistency' issue with the
> EC2 database. Your fix sounds great and exactly what we need here.

FYI, I've implemented the fix and committed it to the develop branch on
github. Please test the 'develop' branch if you can and let me know if
you see the InvalidInstanceID.NotFound error again.

> I did check the logs and they do show the nfs errors. However, I
> don't think (can't recall) that the errors were also printed to the
> screen, but they were definitely in the log file in /tmp I did not
> think to check /etc/exports on the master nor the /etc/fstab on the
> slave nodes. I will be sure to check next time the issue occurs.
> I did code the 'error recovery' into our plugin. However, I know
> that it still has problems for it failed on the one time a new
> cluster showed the nfs problems. However, that was awhile ago and we
> just terminated the cluster and spun up a new one. I still need to
> debug this error handling code but that will be tricky (but not
> impossible).
> Next time I am able to catch a cluster with a bad nfs, I will do the
> checks you suggest and also look to move the plugin code to do the
> remount as in your code snippet.
Sounds good. Please send the relevant lines in the debug log the next
time this occurs as I'm very curious to know what the underlying issue is.

> That works great for me. Many thanks.

Great, I've committed the debug logging format changes to the 'develop'
branch. You should now see %(asctime)s in the debug file.


Version: GnuPG v2.0.17 (GNU/Linux)
Comment: Using GnuPG with Mozilla -

Received on Thu Aug 04 2011 - 13:03:58 EDT
This archive was generated by hypermail 2.3.0.


Sort all by: