StarCluster - Mailing List Archive

Running MPI over the cluster

From: Saurav Prakash <no email>
Date: Thu, 25 Oct 2018 01:44:13 -0700

Hi,

I am running a python program for distributed implementation of
gradient descent. The gradient descent step involves an Allreduce
function to obtain the overall gradient at the workers. I have been
setting up clusters earlier without Starcluster, but recently I needed
to use large clusters and had to move to Starcluster. I was surprised
to see that the MPI.Allreduce operation is much faster in a cluster
generated by Starcluster than in a cluster set up by using traditional
methods. I am curious to know if this is an artifact, or Starcluster
optimizes the communication network somehow to enable efficient
Allreduce step. I am attaching the code for a cluster of 43 nodes (1
master and 42 workers), though I have replaced the data loading with
random data initialization to mimic the gradient step. Any insight
regarding this would be extremely helpful.

Thanks,
Saurav.



  • text/plain attachment: AGC.py
Received on Thu Oct 25 2018 - 04:44:33 EDT
This archive was generated by hypermail 2.3.0.

Search:

Sort all by:

Date

Month

Thread

Author

Subject