rajiv kuriakose's blog: How to install hadoop on Linux

Hadoop can be configured in three different modes:

Local (Standalone) Mode
Pseudo-Distributed Mode
Fully-Distributed Mode

This blog explains Local (Standalone) Mode which is the easiest to configure. And if you just want to get started and run a MapReduce job, you can try this.

Install sun jdk

download jdk 6 (latest update) from http://www.oracle.com/technetwork/java/javase/overview/index.html
I have downloaded update 37 from below url.
http://www.oracle.com/technetwork/java/javase/downloads/jdk6u37-downloads-1859587.html

cd /scratch/rajiv/hadoop/hadoop-1.0.4
./jdk-6u37-linux-x64.bin
jdk will be installed under same folder ( jdk1.6.0_37)

Download latest stable hadoop version

Download hadoop-1.0.4-bin.tar.gz from apache mirror http://www.motorlogy.com/apache/hadoop/common/stable/

Extract hadoop:
tar -xvf hadoop-1.0.4-bin.tar.gz

Now hadoop is extracted under /scratch/rajiv/hadoop/hadoop-1.0.4

set JAVA_HOME for hadoop

vi hadoop-1.0.4/conf/hadoop-env.sh

uncomment below line and update path to jdk 1.6

# export JAVA_HOME=/usr/lib/j2sdk1.5-sun

I have updated it to

export JAVA_HOME=/scratch/rajiv/hadoop/jdk1.6.0_37

Run sample MapReduce program

hadoop-1.0.4-bin.tar.gz has sample programs which are present in hadoop-examples-1.0.4.jar. Lets try Grep.java from this jar.

The source code this class is not present in hadoop distribution. But it can be viewed from Hadoop version control system. Hadoop SVN repository provide option to browse source code onilne.

To view source code of hadoop version 1.0.4, got branch-1.0

And click on Grep.java and view the revision

This job has three steps:

Mapper - Mapper class is set to RegexMapper
Combiner - this is set to LongSumReducer
Reducer - Reducer class is set to LongSumReducer

Job configuration - prepare job parameters like input & output folder. Also specify Mapper and Reducer functions

Job client - used to submit job

$cd /scratch/rajiv/hadoop/hadoop-1.0.4
$mkdir inputfiles
$cd inputfiles
$wget http://hadoop.apache.org/index.html
$cd ..
$./bin/hadoop jar hadoop-examples-*.jar grep inputfiles outputfiles 'Apache'

wget command download index.html file to inputfiles folder. Above command read index.html present in inputfiles folder and grep for occurrences of strings 'Apache'. And writes the string and count to outputfiles folder.

Now examine the output folder

ls outputfiles/
_SUCCESS part-00000

file part-00000 contains output of the MapReduce job.

cat outputfiles/part-00000
46 Apache

In this mode, hadoop run as a single process.
When the above command is running, run "ps -ef | grep RunJar" from another terminal and it shows that there is a java process running which invokes below:

"org.apache.hadoop.util.RunJar hadoop-examples-1.0.4.jar grep inputfiles"

Source code of org.apache.hadoop.util.RunJar can be viewed here

RunJar class basically load Grep.class from hadoop-examples-1.0.4.jar and execute the main method.

Note that in Standalone mode hdfs file system is not configured and MapReduce program runs as single java process. To see hadoop in action you would need to configure Pseudo-Distributed Mode or Fully-Distributed Mode.

rajiv kuriakose's blog

Wednesday, May 8, 2013

How to install hadoop on Linux - Local (Standalone) Mode

Hadoop can be configured in three different modes:

This blog explains Local (Standalone) Mode which is the easiest to configure. And if you just want to get started and run a MapReduce job, you can try this.

Install sun jdk

Download latest stable hadoop version

set JAVA_HOME for hadoop

No comments:

Blog Archive

My Blog List

About Me

rajiv kuriakose's blog

Wednesday, May 8, 2013

How to install hadoop on Linux - Local (Standalone) Mode

Hadoop can be configured in three different modes:

This blog explains Local (Standalone) Mode which is the easiest to configure. And if you just want to get started and run a MapReduce job, you can try this.

Install sun jdk

Download latest stable hadoop version

set JAVA_HOME for hadoop

No comments:

Blog Archive

My Blog List

Subscribe To

About Me