StarCluster - Mailing List Archive

Re: IPython parallel DirectView not distributing to starcluster nodes.

From: Austin So <no email>
Date: Fri, 3 Jul 2015 10:17:24 -0700

Okay. So it appears to be related to IP[y]: Notebook.

Running the test script from the command line on the master shows distribution across all nodes. Running it on the same cluster through IP[y]: Notebook just shows the master.

A.


> On Jul 1, 2015, at 4:40 PM, Austin So <austin.so_at_tomabio.com> wrote:
>
> Thanks.
>
> Yes, I started checking before and after line execution, but nothing different. So a relaunch:
>
> >>> IPCluster has been started on SecurityGroup:_at_sc-toma for user 'sgeadmin'
> with 159 engines on 5 nodes.
>
> In IP[y]: Notebook
>
> Through IP[y]: Notebook, I’m assigning 150 engines.
>
> So using the following test code:
>
> from IPython.parallel import Client
> ipclient = Client(packer = 'pickle')
> dview = ipclient[:]
> lview = ipclient.load_balanced_view()
>
> Here I am getting an array of 150 elements as expected when using len(ipclient.ids)
>
> Now executing code:
>
> %%px --local
> import socket
> import pandas as pd
> def distribute(a):
> return socket.getfqdn()
>
> a = pd.DataFrame(range(0,10000))
>
> version 1a:
> dview.map(distribute, a[0]).get()
>
> version 1b:
> lview.map(distribute, a[0]).get()
>
> --This results in an output of ‘master’ in each element.
>
> version 2:
> dview.scatter('a', a)
> dview.execute('b = distribute(a[0])', block=True)
> dview.gather('b', block=True)
>
> --This also results in an output of ‘master’ in each element.
>
> verifying with len(ipclient.ids) confirms that I have all the engines in place.
>
> A.
>
>
>
>
>
>> On Jul 1, 2015, at 11:14 AM, MinRK <benjaminrk_at_gmail.com <mailto:benjaminrk_at_gmail.com>> wrote:
>>
>> Can you perhaps share a code sample? Have you verified that all the engines are registered with the Client (`Client.ids`) before submitting the tasks?
>>
>> -MinRK
>>
>> On Wed, Jul 1, 2015 at 9:09 AM, Austin So <austin.so_at_tomabio.com <mailto:austin.so_at_tomabio.com>> wrote:
>> I’ve been trying to figure out what I’m doing wrong here, and if it is an issue within the starcluster config file. I’ve exhausted all possible implementations in my code that I could think of.
>>
>> During set up, 255 engines have been recorded to have been set-up by starcluster upon launch that are available to IPcluster.
>>
>> Within IPython Notebooks, I’m trying to distribute a function across all my nodes and engines (5_at_r3.8xlarge).
>>
>> So when the line of code is running, I’m looking at qhost, and I see that only the Master is showing a CPU load. I look at the Cloud Metrics, and I see that only the Master is showing a CPU load. At the suggestion of a friend, I returned a socket.fqdn() call to identify if the results were processed by the master or one of the nodes. All results returned were generated by the Master.
>>
>> Any hints to identify where the source of the problem lies would be greatly appreciated.
>>
>> Best
>>
>> Austin
>>
>>
>>
>>
>>
>> _______________________________________________
>> StarCluster mailing list
>> StarCluster_at_mit.edu <mailto:StarCluster_at_mit.edu>
>> http://mailman.mit.edu/mailman/listinfo/starcluster <http://mailman.mit.edu/mailman/listinfo/starcluster>
>>
>>
>
>
>

==============================

Austin P. So, Ph. D.
Director, Research and Development





353E Vintage Park Dr.
Foster City CA 94404

email: austin.so_at_tomabio.com

**************************************************
This email and any attachments thereto may contain private, confidential, and privileged material for the sole use of the intended recipient. Any review, copying, or distribution of this email (or any attachments thereto) by others is strictly prohibited. If you are not the intended recipient, please contact the sender immediately and permanently delete the original and any copies of this email and any attachments thereto.
Received on Fri Jul 03 2015 - 13:17:30 EDT
This archive was generated by hypermail 2.3.0.

Search:

Sort all by:

Date

Month

Thread

Author

Subject