I am new to Starcluster, but have found the package extremely useful in
running a parameter sweeping grid search training sklearn models. With my
particular problem, each job requires a large amount of memory relative to
the number of CPUs (compensating with memory optimized instances is not
sufficient, each job takes ~40GB of memory when training a model). Thus, I
needed to limit the number of ipengines on each node in the cluster. I
edited the ipcluster plugin such that it supports this optional parameter,
with the default behavior matching that of the original implementation.
I believe that others may find this modification useful, and I would love
feedback on whether or not such a change is interesting to the team.
There are two oddities with the implementation that I wish to discuss:
It requires the IPClusterRestartEngines plugin to also specify the number
It likely requires changes depending on the instance type. Alternatively,
it would be trivial to specify an amount of memory per engine, i.e. start
an engine for each 40GB of memory; this however may be difficult to explain.
I have created a pull request for this change,
but I wanted to reach out to the mailing list for discussion and feedback.
Thanks for sharing this wonderful project, and I hope others find the
limitation on number of engines useful.
Received on Fri Mar 07 2014 - 00:54:45 EST