StarCluster - Mailing List Archive

Re: New Grid Engine Hadoop Integration HOWTO

From: Rayson Ho <no email>
Date: Fri, 1 Jun 2012 15:59:29 -0400

Hi Paul,

I started a new mail thread as our setup is different than what you
have - with the Hadoop Grid Engine integration documented in the
HOWTO, we are not using Hadoop with R. With your setup, in the rmr
integration, R invokes Hadoop directly, and thus it needs to skip the
batch queuing capabilities of Grid Engine.

I believe with your setup, the best thing to do is to change the
Hadoop StarCluster Plugin, and see if you can create the needed
mapred-site.xml file before the Hadoop daemons are brought up.

I believe you can parse /proc/cpuinfo or run Grid Engine's loadcheck
on each node to get the number of processors.

% loadcheck
num_proc 4
m_socket 1
m_core 2
m_topology SCTTCTT
load_short 0.00
load_medium 0.00
load_long 0.00

Then add that to the mapred-site.xml file in the following XML format:

 <value> No. of Processors </value>


Open Grid Scheduler / Grid Engine

Scalable Grid Engine Support Program

On Fri, Jun 1, 2012 at 3:31 PM, Paul McDonagh <> wrote:
> Thanks Rayson,
> This and the previous email are a couple of really good suggestions. I'll try 'em out and see what happens.
> Best,
> Paul.
> On Jun 1, 2012, at 14:52, Rayson Ho wrote:
>> If you are running Hadoop on StarCluster, you may also be interested
>> in this new method contributed by Prakashan Korambath of UCLA.
>> The difference between the original SGE 6.2u5 method vs the new one is
>> that with Prakashan's approach, Grid Engine is used for resource
>> allocation, and the Hadoop job scheduler/Job Tracker is used to handle
>> all the MapReduce operations. A Hadoop cluster is created on demand
>> with Prakashan's approach, but in the original SGE 6.2u5 method Grid
>> Engine replaces the Hadoop job scheduler.
>> As standard Grid Engine PEs are used in this new approach, one can
>> call "qrsh -inherit" and use Grid Engine's method to start Hadoop
>> services on remote nodes, and thus get full job control, job
>> accounting, and cleanup at terminate benefits like any other tight PE
>> jobs!
>> Rayson
>> ================================
>> Open Grid Scheduler / Grid Engine
>> Scalable Grid Engine Support Program
>> _______________________________________________
>> StarCluster mailing list

Open Grid Scheduler - The Official Open Source Grid Engine
Received on Fri Jun 01 2012 - 15:59:30 EDT
This archive was generated by hypermail 2.3.0.


Sort all by: