StarCluster - Mailing List Archive

Re: Configuring number of map jobs per cluster node {Hadoop plugin}

From: Paul McDonagh <no email>
Date: Fri, 1 Jun 2012 10:00:29 -0400

Hi Rayson,

Thanks for the link; I saw that too. However, clicking on the links behind
a) mapred.tasktracker.map.tasks.maximium or
b) mapred.tasktracker.reduce.tasks.maximum
to find out how to use the "configuration knobs" takes you to invalid webpages.

A little more info: I'm using R and hadoop together via the rmr package.

I've come up short on further searches to find out how/where to set those parameters. There's general discussion over whether these should even be allowed to be set by the user. It's not clear to me at least whether these are parameters that would be set on initialization of hadoop or on an individual job submission but it would seem reasonable that you could assign these parameters on a per compute node basis if you had a heteregeneous cluster when hadoop is initialized.

In short, it appears that there is no benefit at the moment in having nodes in a hadoop cluster with more than 2 compute cores which means I have to instantiate very large clusters with all the associated network and I/O latency that comes with EC2 smaller nodes.

Any thoughts?

Paul.


On May 31, 2012, at 15:38, Rayson Ho wrote:

> While integrating some user contributed Hadoop docs into the Open Grid
> Scheduler website, I came across the
> "mapred.tasktracker.map.tasks.maximum" parameter - a quick Google
> search points me to:
>
> Q: I see a maximum of 2 maps/reduces spawned concurrently on each
> TaskTracker, how do I increase that?
> A: Use the configuration knob: mapred.tasktracker.map.tasks.maximum
> and mapred.tasktracker.reduce.tasks.maximum to control the number of
> maps/reduces spawned simultaneously on a TaskTracker. By default, it
> is set to 2, hence one sees a maximum of 2 maps and 2 reduces at a
> given instance on a TaskTracker.
>
> Ref: http://wiki.apache.org/hadoop/FAQ#I_see_a_maximum_of_2_maps.2BAC8-reduces_spawned_concurrently_on_each_TaskTracker.2C_how_do_I_increase_that.3F
>
> Make be it is a matter of setting the parameter??
>
> Rayson
>
> ================================
> Open Grid Scheduler / Grid Engine
> http://gridscheduler.sourceforge.net/
>
> Scalable Grid Engine Support Program
> http://www.scalablelogic.com/
>
>
>
> On Wed, May 30, 2012 at 2:07 PM, Paul McDonagh <mcdonaghpd_at_gmail.com> wrote:
>> Thanks for creating starcluster, it's great. I'm using the Hadoop plugin and I'm working on a c1.xlarge instance type. The c1.xlarge type has 20 EC2 Compute units or 8 virtual cores.
>>
>> When looking at the job tracking webpages that are set up after the cluster is initiated and running, there is a limit of 2 map jobs per cluster node. How can I alter the number map (or reduce) jobs a particular compute node can run? I can't seem to find how to change this. I'd like to be able to use much more of the compute resources for some of the larger compute instance types.
>>
>> Thanks for your help.
>> Paul McDonagh
>>
>>
>>
>> _______________________________________________
>> StarCluster mailing list
>> StarCluster_at_mit.edu
>> http://mailman.mit.edu/mailman/listinfo/starcluster
>
>
>
> --
> ==================================================
> Open Grid Scheduler - The Official Open Source Grid Engine
> http://gridscheduler.sourceforge.net/
Received on Fri Jun 01 2012 - 10:00:35 EDT
This archive was generated by hypermail 2.3.0.

Search:

Sort all by:

Date

Month

Thread

Author

Subject