StarCluster - Mailing List Archive

Re: [Starcluster] failed cluster / detached drive

From: Justin Riley <no email>
Date: Sat, 01 May 2010 08:00:06 -0400

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Dan,

Wow that's bizarre. Not exactly sure why the EBS volume would have been
detached on you, haven't had that happen to me yet (fingers crossed). Of
course, the drive mounted on /home and being randomly detached will
cause the other symptoms you were having. Except I would still expect
root to be able to run qstat...hmmmm.

Unfortunately I don't have any ideas on this one other than this is some
random AWS failure with EBS.

> If it happens again I'll keep the cluster up and let you know right away.

Yes please do, that would be useful thanks!

~Justin

On 04/30/2010 08:20 PM, Dan Yamins wrote:
> Justin, I just had a strange situation where suddenly my cluster
> failed. here were the symptoms:
>
> 1) all my active ssh terminals timed out
> 2) i couldn't log back in as the CLUSTER_USER (I got the "permission
> denied (public key)" error -- though I could ssh in as root
> 3) the mounted EBS volume appears to have disappeared -- e.g. when I
> tried to cd to it from /root, it was reported as not existing.
> 4) the SGE "qstat" command failed to be recognized. (e.g. when i run
> "qstat -xml" as root I got an error in finding the qstat command.)
>
> It seems like my EBS drive might have detached ... but lots of things
> could have happened. Any thoughts?
>
> Anyway, I killed the cluster as i didn't want o keep paying for it. I'm
> starting another one now, and will let you know what the result it. If
> it happens again I'll keep the cluster up and let you know right away.
>
> Dan
>
>
>
> _______________________________________________
> Starcluster mailing list
> Starcluster_at_mit.edu
> http://mailman.mit.edu/mailman/listinfo/starcluster

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.14 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkvcF8YACgkQ4llAkMfDcrmRvgCggYAvHkj8eAQ4erT85cl6UG48
8msAnAolLywgCTHg7XAvxSJVc7mJPLHw
=oYyj
-----END PGP SIGNATURE-----
Received on Sat May 01 2010 - 08:00:10 EDT
This archive was generated by hypermail 2.3.0.

Search:

Sort all by:

Date

Month

Thread

Author

Subject