EC2 latency and MPI

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

From: Jeff Howbert <no email>
Date: Wed, 16 Feb 2011 15:21:38 -0800

Hello -

Just discovered the StarCluster system, and enjoyed looking through the
site. Looks like a very well thought out design. Hope to try it sometime
in the near future.

At Insilicos (www.insilicos.com) we have built a similar platform for
parallel computing in the AWS cloud. Like StarCluster, it uses MPI for
internode communication. We're curious how reliable MPI has been for you.
Occasionally, we have problems bringing up a cluster because mpdboot fails
to establish a communications ring. It's pretty clear this is due to higher
than usual latency between nodes. We take the obvious precaution of making
sure all nodes are provisioned in the same availability zone, and have even
tweaked the timeout tolerances inside mpdboot.py, but there are still
sporadic bad days when the problem occurs.

Anything you can share on your history (or lack thereof) with this problem,
and approaches to resolving it, would be much appreciated.

Regards,

Jeff Howbert

Received on Wed Feb 16 2011 - 18:21:58 EST

This message: [ Message body ]
Next message: Dan Yamins: "reserved instances"
Previous message: Jeff White: "Using starcluster as a library"
Next in thread: Justin Riley: "Re: EC2 latency and MPI"
Reply: Justin Riley: "Re: EC2 latency and MPI"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

This archive was generated by hypermail 2.3.0.

EC2 latency and MPI

Search:

Sort all by:

Navigation

EC2 latency and MPI

Search:

Sort all by:

Navigation