Running MPI over the cluster

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

From: Saurav Prakash <no email>
Date: Thu, 25 Oct 2018 01:44:13 -0700

Hi,

I am running a python program for distributed implementation of
gradient descent. The gradient descent step involves an Allreduce
function to obtain the overall gradient at the workers. I have been
setting up clusters earlier without Starcluster, but recently I needed
to use large clusters and had to move to Starcluster. I was surprised
to see that the MPI.Allreduce operation is much faster in a cluster
generated by Starcluster than in a cluster set up by using traditional
methods. I am curious to know if this is an artifact, or Starcluster
optimizes the communication network somehow to enable efficient
Allreduce step. I am attaching the code for a cluster of 43 nodes (1
master and 42 workers), though I have replaced the data loading with
random data initialization to mimic the gradient step. Any insight
regarding this would be extremely helpful.

Thanks,
Saurav.

text/plain attachment: AGC.py

Received on Thu Oct 25 2018 - 04:44:33 EDT

This message: [ Message body ]
Next message: Chem Jubilant: "NameError: name 'execfile' is not defined during install"
Previous message: Vasisht Tadigotla: "Re: StarCluster base AMI converted to C5 instance type"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

This archive was generated by hypermail 2.3.0.

Running MPI over the cluster

Search:

Sort all by:

Navigation

Running MPI over the cluster

Search:

Sort all by:

Navigation