Re: Parallelization of MPI application with Star Cluster
This archive was generated by
Here are my 2 cents. To the best of my knowledge, the C3 instances are
based on two-socket Intel Xeon E5-2670 servers. This means 2x8=16 physical
cores (2*16 threads with hyper-threading on). Your 2 c3.4xlarge nodes will
only have 2*4=8 physical cores. By running a 32 process MPI job on a 2 node
c3.4xlarge cluster you're actually oversubscribing the available
computational resources, hence you have no more gain in CPU time.
Can you try with c3.8xlarge instances? Two c3.8xlarge nodes will provide
you with 32 physical cores.
On Fri, May 9, 2014 at 7:42 PM, Torstein Fjermestad
> Dear Rayson,
> thank you for your fast and informative reply. I have been studying the
> AWS and the starcluster documentation, and as far as I have understood VPC
> and the placement group are set up automatically. From the management
> console I see that all instances are in the same placement group and have
> the same VPC ID.
> The instance types I am running are the following:
> c3.large for the master node
> c3.4xlarge for the two slave nodes
> Today I redid the scaling test, and when adding the two c3.4xlarge nodes,
> I specified explicitly that they should be based on the HVM-EBS
> image (by using the -i option to addnode). I think I forgot to do this
> The results for 2, 4, 8, and 16 processors are now much better:
> # proc CPU time wall time 2 7m45.70s 8m19.11s 4 3m28.29s 3m22.40s 8
> 2m22.33s 2m18.33s 16 1m18.18s 1m20.59s 32 1m 0.05s 3m 8.53s
> The exception is the result for 32 processors where again the difference
> between the wall time and CPU time is large. Does anyone have any
> suggestions as to what might be causing the bad performance for the
> calculation on 32 processors?
> Thanks in advance for your help.
> Torstein Fjermestad
> On Fri, May 9, 2014 at 6:30 AM, Rayson Ho <raysonlogin_at_gmail.com> wrote:
>> We benchmarked AWS enhanced networking late last year & beginning of this
>> There are a few things that can affect MPI performance of AWS with
>> enhanced networking:
>> 1) Make sure that you are using a VPC, because instances in non-VPC
>> default back to standard networking.
>> 2) Make sure that your instances are all in a AWS Placement Group, or
>> else the latency would be much longer.
>> 3) Finally, you didn't specify the instance type -- it's important to
>> know what kind of instances you used to perform the test...
>> Open Grid Scheduler - The Official Open Source Grid Engine
>> On Thu, May 8, 2014 at 1:30 PM, Torstein Fjermestad <
>> tfjermestad_at_gmail.com> wrote:
>>> Dear all,
>>> I am planning to use Star Cluster to run Quantum Espresso (
>>> http://www.quantum-espresso.org/) calculations. For those who are not
>>> familiar with Quantum Espresso; it is a code to run quantum mechanical
>>> calculations on materials. In order for these types of calculations to
>>> achieve good scaling with respect to the number of CPU, fast communication
>>> hardware is necessary.
>>> For this reason, I configured a cluster based on the HVM-EBS image:
>>>  ami-ca4abfbd eu-west-1 starcluster-base-ubuntu-13.04-x86_64-hvm
>>> Then I followed the instructions on this site
>>> to check that "enhanced networking" was indeed enabled. Running the
>>> suggested commands gave me the same output as in the examples. This
>>> certainly indicated that "enhanced networking" is enabled in the image.
>>> On this image I installed Quantum Espresso (by use of apt-get install)
>>> and I generated a new modified image from which I generated the final
>>> On this cluster, I carried out some parallelization tests by running the
>>> same Quantum Espresso calculation on different number of CPUs. I present
>>> the results below:
>>> # proc CPU time wall time 4
>>> 4m23.98s 5m 0.10s 8 2m46.25s 2m49.30s 16 1m40.98s 4m 2.82s 32
>>> 0m57.70s 3m36.15s
>>> Except from the test ran with 8 CPUs, the wall time is significantly
>>> longer than the CPU time. This is usually an indication of a slow
>>> communication between the CPUs/nodes.
>>> My question is therefore whether there is a way to check the
>>> communication speed between the nodes / CPUs.
>>> The large difference between the CPU time and wall time may also be
>>> caused by an incorrect configuration of the cluster. Is there something I
>>> have done wrong / forgotten?
>>> Does anyone have suggestions on how I can fix this parallelization issue?
>>> Thanks in advance for your help.
>>> Torstein Fjermestad
>>> StarCluster mailing list
> StarCluster mailing list
Received on Sun May 11 2014 - 15:18:57 EDT