StarCluster - Mailing List Archive

Re: MPICH Fabric

From: David Stuebe <no email>
Date: Fri, 16 May 2014 16:45:45 +0000

Hi Rayson

I have written a plugin to run the intel installer from a tgz file on s3. It takes a while but seems to work.

I am using one of the stock public images:
[0] ami-3393a45a us-east-1 starcluster-base-ubuntu-13.04-x86_64 (EBS)

Should I be using the HVM image? I understood they are only needed for GPU computing?

How can I tell if I have the enhanced networking driver setup correctly? In the paravirtual machine lspci etc show nothing?

Thanks for the suggestions!

David Stuebe
Scientist & Software Engineer – RPS ASA

55 Village Square Drive
South Kingstown, RI 02879-8248

Tel: +1 (401) 789-6224
Email: David.Stuebe_at_rpsgroup.com<mailto:David.Stuebe_at_rpsgroup.com>
www: asascience.com<http://www.asascience.com/> | rpsgroup.com<http://www.rpsgroup.com/>

A member of the RPS Group plc

From: Rayson Ho <raysonlogin_at_gmail.com<mailto:raysonlogin_at_gmail.com>>
Date: Fri, 16 May 2014 09:13:03 -0400
To: David Stuebe <dstuebe_at_asascience.com<mailto:dstuebe_at_asascience.com>>
Cc: "starcluster_at_mit.edu<mailto:starcluster_at_mit.edu>" <starcluster_at_mit.edu<mailto:starcluster_at_mit.edu>>
Subject: Re: [StarCluster] MPICH Fabric

How are you deploying the Intel Cluster Compiler Suite? If you are using a custom AMI, then make sure that you have the AWS enhanced networking NIC driver setup correctly, and also make sure that your instances are all in a placement group, and in a VPC (those should be set by StarCluster if you are using the latest stable version).

We benchmarked AWS enhanced networking on the C3 family a few months ago, and the latency is around 20% better on a pair of C3.8xlarge instances in a placement group with AWS enhanced networking enabled:

http://blogs.scalablelogic.com/2014/01/enhanced-networking-in-aws-cloud-part-2.html

Rayson

==================================================
Open Grid Scheduler - The Official Open Source Grid Engine
http://gridscheduler.sourceforge.net/
http://gridscheduler.sourceforge.net/GridEngine/GridEngineCloud.html


On Thu, May 15, 2014 at 3:18 PM, David Stuebe <DStuebe_at_asascience.com<mailto:DStuebe_at_asascience.com>> wrote:

Hi Starcluster

Do anyone have advice on what fabric to use when running on AWS?

I know the interconnect is supposed to be 10-gigE but my model is more dependent on latency than throughput.

I have had to use the Intel Cluster Compiler Suite rather than the built in OpenMPI. Hoping to resolve those issues and compare the two – I am interested to see the performance differences…

Currently the model actually has a negative performance curve as I add processors past a single node.

Model performance on running on Amazon…
C3.8Xlarge - 1 instance, 32 cores
 ! IINT SIMTIME(UTC) FINISH IN SECS/IT PERCENT COMPLETE
!8396282 2014-03-18T00:18:02.000000 0000:07:54:19 0.1103 | |

C3.8Xlarge - 2 instance, 64 cores
 !8395221 2014-03-18T00:00:21.000000 0000:20:27:58 0.2843 | |

C3.8Xlarge - 3 instance, 96 cores
!8395273 2014-03-18T00:01:13.000000 0007:18:33:19 2.5918 | |


David Stuebe
Scientist & Software Engineer – RPS ASA

55 Village Square Drive
South Kingstown, RI 02879-8248

Tel: +1 (401) 789-6224<tel:%2B1%20%28401%29%20789-6224>
Email: David.Stuebe_at_rpsgroup.com<mailto:David.Stuebe_at_rpsgroup.com>
www: asascience.com<http://www.asascience.com/> | rpsgroup.com<http://www.rpsgroup.com/>

A member of the RPS Group plc

_______________________________________________
StarCluster mailing list
StarCluster_at_mit.edu<mailto:StarCluster_at_mit.edu>
http://mailman.mit.edu/mailman/listinfo/starcluster
Received on Fri May 16 2014 - 12:45:48 EDT
This archive was generated by hypermail 2.3.0.

Search:

Sort all by:

Date

Month

Thread

Author

Subject