Re: Parallelization of MPI application with Star Cluster
This archive was generated by
We benchmarked AWS enhanced networking late last year & beginning of this
There are a few things that can affect MPI performance of AWS with enhanced
1) Make sure that you are using a VPC, because instances in non-VPC default
back to standard networking.
2) Make sure that your instances are all in a AWS Placement Group, or else
the latency would be much longer.
3) Finally, you didn't specify the instance type -- it's important to know
what kind of instances you used to perform the test...
Open Grid Scheduler - The Official Open Source Grid Engine
On Thu, May 8, 2014 at 1:30 PM, Torstein Fjermestad
> Dear all,
> I am planning to use Star Cluster to run Quantum Espresso (
> http://www.quantum-espresso.org/) calculations. For those who are not
> familiar with Quantum Espresso; it is a code to run quantum mechanical
> calculations on materials. In order for these types of calculations to
> achieve good scaling with respect to the number of CPU, fast communication
> hardware is necessary.
> For this reason, I configured a cluster based on the HVM-EBS image:
>  ami-ca4abfbd eu-west-1 starcluster-base-ubuntu-13.04-x86_64-hvm
> Then I followed the instructions on this site
> to check that "enhanced networking" was indeed enabled. Running the
> suggested commands gave me the same output as in the examples. This
> certainly indicated that "enhanced networking" is enabled in the image.
> On this image I installed Quantum Espresso (by use of apt-get install) and
> I generated a new modified image from which I generated the final cluster.
> On this cluster, I carried out some parallelization tests by running the
> same Quantum Espresso calculation on different number of CPUs. I present
> the results below:
> # proc CPU time wall time 4
> 4m23.98s 5m 0.10s 8 2m46.25s 2m49.30s 16 1m40.98s 4m 2.82s 32 0m57.70s
> Except from the test ran with 8 CPUs, the wall time is significantly
> longer than the CPU time. This is usually an indication of a slow
> communication between the CPUs/nodes.
> My question is therefore whether there is a way to check the communication
> speed between the nodes / CPUs.
> The large difference between the CPU time and wall time may also be caused
> by an incorrect configuration of the cluster. Is there something I have
> done wrong / forgotten?
> Does anyone have suggestions on how I can fix this parallelization issue?
> Thanks in advance for your help.
> Torstein Fjermestad
> StarCluster mailing list
Received on Fri May 09 2014 - 00:30:30 EDT