Dear all,
I am planning to use Star Cluster to run Quantum Espresso (
http://www.quantum-espresso.org/) calculations. For those who are not
familiar with Quantum Espresso; it is a code to run quantum mechanical
calculations on materials. In order for these types of calculations to
achieve good scaling with respect to the number of CPU, fast communication
hardware is necessary.
For this reason, I configured a cluster based on the HVM-EBS image:
[1] ami-ca4abfbd eu-west-1 starcluster-base-ubuntu-13.04-x86_64-hvm
(HVM-EBS)
Then I followed the instructions on this site
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/enhanced-networking.html#test-enhanced-networking
to check that "enhanced networking" was indeed enabled. Running the
suggested commands gave me the same output as in the examples. This
certainly indicated that "enhanced networking" is enabled in the image.
On this image I installed Quantum Espresso (by use of apt-get install) and
I generated a new modified image from which I generated the final cluster.
On this cluster, I carried out some parallelization tests by running the
same Quantum Espresso calculation on different number of CPUs. I present
the results below:
# proc CPU time wall time 4
4m23.98s 5m 0.10s 8 2m46.25s 2m49.30s 16 1m40.98s 4m 2.82s 32 0m57.70s
3m36.15s
Except from the test ran with 8 CPUs, the wall time is significantly longer
than the CPU time. This is usually an indication of a slow communication
between the CPUs/nodes.
My question is therefore whether there is a way to check the communication
speed between the nodes / CPUs.
The large difference between the CPU time and wall time may also be caused
by an incorrect configuration of the cluster. Is there something I have
done wrong / forgotten?
Does anyone have suggestions on how I can fix this parallelization issue?
Thanks in advance for your help.
Regards,
Torstein Fjermestad
Received on Thu May 08 2014 - 13:30:17 EDT