Re: Parallelization of MPI application with Star Cluster
Oops, that should read Intel Xeon E5-2670 v2 with 10 physical cores (20
threads). The conclusion about oversubscribing the 2 node
c3.4xlarge-based cluster
still stands though.
Gonçalo
On Sun, May 11, 2014 at 9:18 PM, Gonçalo Albuquerque <albusquercus_at_gmail.com
> wrote:
> Hi Torstein,
>
> Here are my 2 cents. To the best of my knowledge, the C3 instances are
> based on two-socket Intel Xeon E5-2670 servers. This means 2x8=16
> physical cores (2*16 threads with hyper-threading on). Your 2 c3.4xlarge
> nodes will only have 2*4=8 physical cores. By running a 32 process MPI job
> on a 2 node c3.4xlarge cluster you're actually oversubscribing the
> available computational resources, hence you have no more gain in CPU time.
>
> Can you try with c3.8xlarge instances? Two c3.8xlarge nodes will provide
> you with 32 physical cores.
>
> Gonçalo
>
>
> On Fri, May 9, 2014 at 7:42 PM, Torstein Fjermestad <tfjermestad_at_gmail.com
> > wrote:
>
>> Dear Rayson,
>>
>> thank you for your fast and informative reply. I have been studying the
>> AWS and the starcluster documentation, and as far as I have understood VPC
>> and the placement group are set up automatically. From the management
>> console I see that all instances are in the same placement group and have
>> the same VPC ID.
>>
>> The instance types I am running are the following:
>>
>> c3.large for the master node
>> c3.4xlarge for the two slave nodes
>>
>> Today I redid the scaling test, and when adding the two c3.4xlarge nodes,
>> I specified explicitly that they should be based on the HVM-EBS
>> image (by using the -i option to addnode). I think I forgot to do this
>> yesterday.
>> The results for 2, 4, 8, and 16 processors are now much better:
>>
>> # proc CPU time wall time 2 7m45.70s 8m19.11s 4 3m28.29s 3m22.40s 8
>> 2m22.33s 2m18.33s 16 1m18.18s 1m20.59s 32 1m 0.05s 3m 8.53s
>> The exception is the result for 32 processors where again the difference
>> between the wall time and CPU time is large. Does anyone have any
>> suggestions as to what might be causing the bad performance for the
>> calculation on 32 processors?
>>
>> Thanks in advance for your help.
>>
>> Regards,
>> Torstein Fjermestad
>>
>>
>>
>> On Fri, May 9, 2014 at 6:30 AM, Rayson Ho <raysonlogin_at_gmail.com> wrote:
>>
>>> We benchmarked AWS enhanced networking late last year & beginning of
>>> this year:
>>>
>>>
>>> http://blogs.scalablelogic.com/2013/12/enhanced-networking-in-aws-cloud.html
>>>
>>> http://blogs.scalablelogic.com/2014/01/enhanced-networking-in-aws-cloud-part-2.html
>>>
>>> There are a few things that can affect MPI performance of AWS with
>>> enhanced networking:
>>>
>>> 1) Make sure that you are using a VPC, because instances in non-VPC
>>> default back to standard networking.
>>>
>>> 2) Make sure that your instances are all in a AWS Placement Group, or
>>> else the latency would be much longer.
>>>
>>> 3) Finally, you didn't specify the instance type -- it's important to
>>> know what kind of instances you used to perform the test...
>>>
>>> Rayson
>>>
>>> ==================================================
>>> Open Grid Scheduler - The Official Open Source Grid Engine
>>> http://gridscheduler.sourceforge.net/
>>> http://gridscheduler.sourceforge.net/GridEngine/GridEngineCloud.html
>>>
>>>
>>> On Thu, May 8, 2014 at 1:30 PM, Torstein Fjermestad <
>>> tfjermestad_at_gmail.com> wrote:
>>>
>>>> Dear all,
>>>>
>>>> I am planning to use Star Cluster to run Quantum Espresso (
>>>> http://www.quantum-espresso.org/) calculations. For those who are not
>>>> familiar with Quantum Espresso; it is a code to run quantum mechanical
>>>> calculations on materials. In order for these types of calculations to
>>>> achieve good scaling with respect to the number of CPU, fast communication
>>>> hardware is necessary.
>>>>
>>>> For this reason, I configured a cluster based on the HVM-EBS image:
>>>>
>>>> [1] ami-ca4abfbd eu-west-1 starcluster-base-ubuntu-13.04-x86_64-hvm
>>>> (HVM-EBS)
>>>>
>>>> Then I followed the instructions on this site
>>>>
>>>>
>>>> http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/enhanced-networking.html#test-enhanced-networking
>>>>
>>>> to check that "enhanced networking" was indeed enabled. Running the
>>>> suggested commands gave me the same output as in the examples. This
>>>> certainly indicated that "enhanced networking" is enabled in the image.
>>>>
>>>> On this image I installed Quantum Espresso (by use of apt-get install)
>>>> and I generated a new modified image from which I generated the final
>>>> cluster.
>>>>
>>>> On this cluster, I carried out some parallelization tests by running
>>>> the same Quantum Espresso calculation on different number of CPUs. I
>>>> present the results below:
>>>>
>>>> # proc CPU time wall time 4
>>>> 4m23.98s 5m 0.10s 8 2m46.25s 2m49.30s 16 1m40.98s 4m 2.82s 32
>>>> 0m57.70s 3m36.15s
>>>> Except from the test ran with 8 CPUs, the wall time is significantly
>>>> longer than the CPU time. This is usually an indication of a slow
>>>> communication between the CPUs/nodes.
>>>>
>>>> My question is therefore whether there is a way to check the
>>>> communication speed between the nodes / CPUs.
>>>>
>>>> The large difference between the CPU time and wall time may also be
>>>> caused by an incorrect configuration of the cluster. Is there something I
>>>> have done wrong / forgotten?
>>>>
>>>> Does anyone have suggestions on how I can fix this parallelization
>>>> issue?
>>>>
>>>> Thanks in advance for your help.
>>>>
>>>> Regards,
>>>> Torstein Fjermestad
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> StarCluster mailing list
>>>> StarCluster_at_mit.edu
>>>> http://mailman.mit.edu/mailman/listinfo/starcluster
>>>>
>>>>
>>>
>>
>> _______________________________________________
>> StarCluster mailing list
>> StarCluster_at_mit.edu
>> http://mailman.mit.edu/mailman/listinfo/starcluster
>>
>>
>
Received on Sun May 11 2014 - 15:33:47 EDT
This archive was generated by
hypermail 2.3.0.