Re: StarCluster Digest, Vol 22, Issue 1
This archive was generated by
-----BEGIN PGP SIGNED MESSAGE-----
On 07/27/2011 01:56 AM, Don MacMillen wrote:
> Yes, I thought it must be an 'eventual consistency' issue with the
> EC2 database. Your fix sounds great and exactly what we need here.
FYI, I've implemented the fix and committed it to the develop branch on
github. Please test the 'develop' branch if you can and let me know if
you see the InvalidInstanceID.NotFound error again.
> I did check the logs and they do show the nfs errors. However, I
> don't think (can't recall) that the errors were also printed to the
> screen, but they were definitely in the log file in /tmp I did not
> think to check /etc/exports on the master nor the /etc/fstab on the
> slave nodes. I will be sure to check next time the issue occurs.
> I did code the 'error recovery' into our plugin. However, I know
> that it still has problems for it failed on the one time a new
> cluster showed the nfs problems. However, that was awhile ago and we
> just terminated the cluster and spun up a new one. I still need to
> debug this error handling code but that will be tricky (but not
> Next time I am able to catch a cluster with a bad nfs, I will do the
> checks you suggest and also look to move the plugin code to do the
> remount as in your code snippet.
Sounds good. Please send the relevant lines in the debug log the next
time this occurs as I'm very curious to know what the underlying issue is.
> That works great for me. Many thanks.
Great, I've committed the debug logging format changes to the 'develop'
branch. You should now see %(asctime)s in the debug file.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
-----END PGP SIGNATURE-----
Received on Thu Aug 04 2011 - 13:03:58 EDT