StarCluster - Mailing List Archive

Re: IPython parallel DirectView not distributing to starcluster nodes.

From: Austin So <no email>
Date: Wed, 1 Jul 2015 16:40:12 -0700

Thanks.

Yes, I started checking before and after line execution, but nothing different. So a relaunch:

>>> IPCluster has been started on SecurityGroup:_at_sc-toma for user 'sgeadmin'
with 159 engines on 5 nodes.

In IP[y]: Notebook

Through IP[y]: Notebook, I’m assigning 150 engines.

So using the following test code:

from IPython.parallel import Client
ipclient = Client(packer = 'pickle')
dview = ipclient[:]
lview = ipclient.load_balanced_view()

Here I am getting an array of 150 elements as expected when using len(ipclient.ids)

Now executing code:

%%px --local
import socket
import pandas as pd
def distribute(a):
        return socket.getfqdn()

a = pd.DataFrame(range(0,10000))

version 1a:
dview.map(distribute, a[0]).get()

version 1b:
lview.map(distribute, a[0]).get()

--This results in an output of ‘master’ in each element.

version 2:
dview.scatter('a', a)
dview.execute('b = distribute(a[0])', block=True)
dview.gather('b', block=True)

--This also results in an output of ‘master’ in each element.

verifying with len(ipclient.ids) confirms that I have all the engines in place.

A.





> On Jul 1, 2015, at 11:14 AM, MinRK <benjaminrk_at_gmail.com> wrote:
>
> Can you perhaps share a code sample? Have you verified that all the engines are registered with the Client (`Client.ids`) before submitting the tasks?
>
> -MinRK
>
> On Wed, Jul 1, 2015 at 9:09 AM, Austin So <austin.so_at_tomabio.com <mailto:austin.so_at_tomabio.com>> wrote:
> I’ve been trying to figure out what I’m doing wrong here, and if it is an issue within the starcluster config file. I’ve exhausted all possible implementations in my code that I could think of.
>
> During set up, 255 engines have been recorded to have been set-up by starcluster upon launch that are available to IPcluster.
>
> Within IPython Notebooks, I’m trying to distribute a function across all my nodes and engines (5_at_r3.8xlarge).
>
> So when the line of code is running, I’m looking at qhost, and I see that only the Master is showing a CPU load. I look at the Cloud Metrics, and I see that only the Master is showing a CPU load. At the suggestion of a friend, I returned a socket.fqdn() call to identify if the results were processed by the master or one of the nodes. All results returned were generated by the Master.
>
> Any hints to identify where the source of the problem lies would be greatly appreciated.
>
> Best
>
> Austin
>
>
>
>
>
> _______________________________________________
> StarCluster mailing list
> StarCluster_at_mit.edu <mailto:StarCluster_at_mit.edu>
> http://mailman.mit.edu/mailman/listinfo/starcluster <http://mailman.mit.edu/mailman/listinfo/starcluster>
>
>
Received on Wed Jul 01 2015 - 19:40:17 EDT
This archive was generated by hypermail 2.3.0.

Search:

Sort all by:

Date

Month

Thread

Author

Subject