Re: Parallelization of MPI application with Star Cluster

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

From: Torstein Fjermestad <no email>
Date: Fri, 9 May 2014 19:42:06 +0200

Dear Rayson,

thank you for your fast and informative reply. I have been studying the AWS
and the starcluster documentation, and as far as I have understood VPC and
the placement group are set up automatically. From the management console I
see that all instances are in the same placement group and have the same
VPC ID.

The instance types I am running are the following:

c3.large for the master node
c3.4xlarge for the two slave nodes

Today I redid the scaling test, and when adding the two c3.4xlarge nodes, I
specified explicitly that they should be based on the HVM-EBS
image (by using the -i option to addnode). I think I forgot to do this
yesterday.
The results for 2, 4, 8, and 16 processors are now much better:

# proc CPU time wall time 2 7m45.70s 8m19.11s 4 3m28.29s 3m22.40s 8
2m22.33s 2m18.33s 16 1m18.18s 1m20.59s 32 1m 0.05s 3m 8.53s
The exception is the result for 32 processors where again the difference
between the wall time and CPU time is large. Does anyone have any
suggestions as to what might be causing the bad performance for the
calculation on 32 processors?

Thanks in advance for your help.

Regards,
Torstein Fjermestad

On Fri, May 9, 2014 at 6:30 AM, Rayson Ho <raysonlogin_at_gmail.com> wrote:

> We benchmarked AWS enhanced networking late last year & beginning of this
> year:
>
>
> http://blogs.scalablelogic.com/2013/12/enhanced-networking-in-aws-cloud.html
>
> http://blogs.scalablelogic.com/2014/01/enhanced-networking-in-aws-cloud-part-2.html
>
> There are a few things that can affect MPI performance of AWS with
> enhanced networking:
>
> 1) Make sure that you are using a VPC, because instances in non-VPC
> default back to standard networking.
>
> 2) Make sure that your instances are all in a AWS Placement Group, or else
> the latency would be much longer.
>
> 3) Finally, you didn't specify the instance type -- it's important to know
> what kind of instances you used to perform the test...
>
> Rayson
>
> ==================================================
> Open Grid Scheduler - The Official Open Source Grid Engine
> http://gridscheduler.sourceforge.net/
> http://gridscheduler.sourceforge.net/GridEngine/GridEngineCloud.html
>
>
> On Thu, May 8, 2014 at 1:30 PM, Torstein Fjermestad <tfjermestad_at_gmail.com
> > wrote:
>
>> Dear all,
>>
>> I am planning to use Star Cluster to run Quantum Espresso (
>> http://www.quantum-espresso.org/) calculations. For those who are not
>> familiar with Quantum Espresso; it is a code to run quantum mechanical
>> calculations on materials. In order for these types of calculations to
>> achieve good scaling with respect to the number of CPU, fast communication
>> hardware is necessary.
>>
>> For this reason, I configured a cluster based on the HVM-EBS image:
>>
>> [1] ami-ca4abfbd eu-west-1 starcluster-base-ubuntu-13.04-x86_64-hvm
>> (HVM-EBS)
>>
>> Then I followed the instructions on this site
>>
>>
>> http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/enhanced-networking.html#test-enhanced-networking
>>
>> to check that "enhanced networking" was indeed enabled. Running the
>> suggested commands gave me the same output as in the examples. This
>> certainly indicated that "enhanced networking" is enabled in the image.
>>
>> On this image I installed Quantum Espresso (by use of apt-get install)
>> and I generated a new modified image from which I generated the final
>> cluster.
>>
>> On this cluster, I carried out some parallelization tests by running the
>> same Quantum Espresso calculation on different number of CPUs. I present
>> the results below:
>>
>> # proc CPU time wall time 4
>> 4m23.98s 5m 0.10s 8 2m46.25s 2m49.30s 16 1m40.98s 4m 2.82s 32 0m57.70s
>> 3m36.15s
>> Except from the test ran with 8 CPUs, the wall time is significantly
>> longer than the CPU time. This is usually an indication of a slow
>> communication between the CPUs/nodes.
>>
>> My question is therefore whether there is a way to check the
>> communication speed between the nodes / CPUs.
>>
>> The large difference between the CPU time and wall time may also be
>> caused by an incorrect configuration of the cluster. Is there something I
>> have done wrong / forgotten?
>>
>> Does anyone have suggestions on how I can fix this parallelization issue?
>>
>> Thanks in advance for your help.
>>
>> Regards,
>> Torstein Fjermestad
>>
>>
>>
>>
>> _______________________________________________
>> StarCluster mailing list
>> StarCluster_at_mit.edu
>> http://mailman.mit.edu/mailman/listinfo/starcluster
>>
>>
>
Received on Fri May 09 2014 - 13:42:07 EDT

This message: [ Message body ]
Next message: David Stuebe: "Re: Fast shared or local storage? (Cedar McKay)"
Previous message: Cedar McKay: "USERDATA_SCRIPTS not running with addnode"
In reply to: Rayson Ho: "Re: Parallelization of MPI application with Star Cluster"
Next in thread: Gonçalo Albuquerque: "Re: Parallelization of MPI application with Star Cluster"
Reply: Gonçalo Albuquerque: "Re: Parallelization of MPI application with Star Cluster"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

This archive was generated by hypermail 2.3.0.

Search:

Sort all by:

Navigation

Re: Parallelization of MPI application with Star Cluster

Search:

Sort all by:

Navigation