There shouldn't be too much I/O be unless I'm missing something.
In iPython, I read the data from an HDF store on each node (once), then
instantiate a class on each node with the data:
%%px
store = pd.HDFStore(data_file, 'r') rows = store.select('results',
['cv_score_mean > 0']) rows = rows.sort('cv_score_mean', ascending=False)
rows['results_index'] = rows.index
# This doesn't take too long.
model_analytics = ResultsAnalytics(rows, store['data_model'])
---
## This dispatch takes between 1.5 min to 5 min
## 66K jobs
ar = lview.map(lambda x: model_analytics.generate_prediction_heuristic(x),
rows_index)
---
ar.wait_interactive(interval=1.0)
63999/66230 tasks finished after 2181 s
done
So the whole run takes awhile, though each job itself is relatively short.
But I don't understand why CPU isn't the limiting factor.
Rajat, thanks for recommending dstat.
Best,
Chris
On Thu, Jul 30, 2015 at 10:52 AM Jacob Barhak <jacob.barhak_at_gmail.com>
wrote:
> Hi Christopher,
>
> Do you have a lot of I/O? For example writing and reading many files to
> the same NFS location?
>
> This may explain things.
>
> Jacob
> On Jul 30, 2015 2:34 AM, "Christopher Clearfield" <
> chris.clearfield_at_system-logic.com> wrote:
>
>> Hi All,
>> I'm running a set of about 60K relatively short jobs that take 30 minutes
>> to run. This is through ipython parallel.
>>
>> Yet my CPU utilization levels are relatively small:
>>
>> queuename qtype resv/used/tot. load_avg arch states
>> ---------------------------------------------------------------------------------
>> all.q_at_master BIP 0/0/2 0.98 linux-x64
>> ---------------------------------------------------------------------------------
>> all.q_at_node001 BIP 0/0/8 8.01 linux-x64
>> ---------------------------------------------------------------------------------
>> all.q_at_node002 BIP 0/0/8 8.07 linux-x64
>> ---------------------------------------------------------------------------------
>> all.q_at_node003 BIP 0/0/8 7.96 linux-x64
>>
>> (I disabled the ipython engines on master because I was having heartbeat
>> timeout issues with the worker engines on my nodes, which explains why that
>> is so low).
>>
>> But ~8% utilization on the nodes. Is that expected?
>>
>> Thanks,
>> Chris
>>
>>
>> _______________________________________________
>> StarCluster mailing list
>> StarCluster_at_mit.edu
>> http://mailman.mit.edu/mailman/listinfo/starcluster
>>
>>
Received on Thu Jul 30 2015 - 21:34:12 EDT
This archive was generated by
hypermail 2.3.0.