Rajeeve Kuriakose's Blog: 2015

This blog explains how to install and configure hadoop in Pseudo-Distributed Mode on Linux (Oracle Enterprise Linux). And common errors encountered during setup.

Install sun jdk

download jdk 6 (latest update) from http://www.oracle.com/technetwork/java/javase/overview/index.html
I have downloaded update 37 from below url.
http://www.oracle.com/technetwork/java/javase/downloads/jdk6u37-downloads-1859587.html

cd /softwares/hadoop
./jdk-6u37-linux-x64.bin
jdk will be installed under same folder ( jdk1.6.0_37)

Download latest stable hadoop version

Download hadoop-1.0.4-bin.tar.gz from apache mirror http://www.motorlogy.com/apache/hadoop/common/stable/

Extract hadoop:
tar -xvf hadoop-1.0.4-bin.tar.gz

Now hadoop is extracted under /scratch/rajiv/softwares/hadoop/hadoop-1.0.4

set JAVA_HOME for hadoop

vi hadoop-1.0.4/conf/hadoop-env.sh

uncomment below line and update path to jdk 1.6

# export JAVA_HOME=/usr/lib/j2sdk1.5-sun

I have updated it to

export JAVA_HOME=/softwares/hadoop/jdk1.6.0_37

Configure SSH

You would need ssh server(demon) and client on your box.

$yum -y install openssh-server openssh-clients

If yum is repository is not configured, refer this post

Ensure if the SSH server daemon sshd is running.
$ /sbin/service sshd status
If the SSH server daemon sshd is not running, start this daemon using below command:

$ /sbin/service sshd start

Alternativly to determine if SSH is running, enter the following command:
$ pgrep sshd

If SSH is running, this command returns one or more process ids

Run "which ssh" to check if you have ssh client installed.

Generate ssh key (passphraseless )

$ ssh-keygen -t rsa -P ""

This creates rsa keypair under home folder with empty password.
Verify that id_rsa id_rsa.pub files are present under .ssh folder.

Copy keypair to authorizes keys

copy id_rsa.pub to authorized_keys

hadoop-1.0.4]$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

To Test ssh configuration run "ssh localhost". Below output confirms that ssh is working.

The authenticity of host 'localhost (127.0.0.1)' can't be established.

    RSA key fingerprint is 89:20:58:b4:06:c6:c6:5a:08:1e:43:eb:cc:e5:45:49.

    Are you sure you want to continue connecting (yes/no)? yes

    Warning: Permanently added 'localhost' (RSA) to the list of known hosts.

Modify hadoop config file - core-site.xml

1) modify core-site.xml to add two properties under configuration tag.

a) hadoop.tmp.dir - if this property is not configured, hdfs will be configured under "/tmp/hadoop-/dfs".
b) fs.default.name - if this property is not configured, startup of secondarynamenode will fail

Sample conf/core-site.xml

2) modify hdfs-site.xml
add dfs.replication property and set value to 1

Sample hdfs-site.xml

c) modify mapred-site.xml
add mapred.job.tracker property and set value to "localhost:9001"

Sample mapred-site.xml

Format hdfs file system

Run below command

$./hadoop-1.0.4/bin/hadoop namenode -format

start hadoop

/scratch/rajiv/softwares/hadoop/hadoop-1.0.4/bin/start-all.sh

make sure all hadoop processes are running

run "$JAVA_HOME/bin/jps" and make sure below processes are running TaskTracker JobTracker DataNode SecondaryNameNode NameNode

stop hadoop

./bin/stop-all.sh

stopping jobtracker localhost: stopping tasktracker stopping namenode localhost: stopping datanode localhost: stopping secondarynamenode

Common errors and troubleshooting steps

1) If ssh is not configured, following errors will be displayed while starting hadoop

$ ./start-all.sh

starting namenode, logging to /scratch/rajiv/softwares/hadoop/hadoop-1.0.4/libexec/../logs/hadoop-rajiv-namenode-myhostname.out rajiv@localhost's password: rajiv@localhost's password: localhost: Permission denied, please try again. localhost: Permission denied, please try again. rajiv@localhost's password: localhost: starting datanode, logging to /scratch/rajiv/softwares/hadoop/hadoop-1.0.4/bin/../logs/hadoop-rajiv-datanode-myhostname.out

2) If hdfs is not configured, starting hadoop will fail with following message

To fix this make sure " core-site.xml" has below snippet

fs.default.name
hdfs://localhost:54310

$./start-all.sh

starting namenode, logging to /scratch/rajiv/softwares/hadoop/hadoop-1.0.4/libexec/../logs/hadoop-rajiv-namenode-myhostname.out localhost: starting datanode, logging to /scratch/rajiv/softwares/hadoop/hadoop-1.0.4/bin/../logs/hadoop-rajiv-datanode-myhostname.out localhost: starting secondarynamenode, logging to /scratch/rajiv/softwares/hadoop/hadoop-1.0.4/bin/../logs/hadoop-rajiv-secondarynamenode-myhostname.out localhost: Exception in thread "main" java.lang.IllegalArgumentException: Does not contain a valid host:port authority: file:/// localhost: at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:162) localhost: at org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(NameNode.java:198) localhost: at org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(NameNode.java:228) localhost: at org.apache.hadoop.hdfs.server.namenode.NameNode.getServiceAddress(NameNode.java:222) localhost: at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.initialize(SecondaryNameNode.java:161) localhost: at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.(SecondaryNameNode.java:129) localhost: at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.main(SecondaryNameNode.java:567) starting jobtracker, logging to /scratch/rajiv/softwares/hadoop/hadoop-1.0.4/libexec/../logs/hadoop-rajiv-jobtracker-myhostname.out localhost: starting tasktracker, logging to /scratch/rajiv/softwares/hadoop/hadoop-1.0.4/bin/../logs/hadoop-rajiv-tasktracker-myhostname.out

3) Cause: invalid tmp folder specified:

$ ./bin/hadoop namenode -format
To fix this check core-site.xml and ensure hadoop.tmp.dir element has correct value.

13/03/26 21:49:33 INFO namenode.NameNode: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting NameNode STARTUP_MSG: host = myhostname/myipaddress STARTUP_MSG: args = [-format] STARTUP_MSG: version = 1.0.4 STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r 1393290; compiled by 'hortonfo' on Wed Oct 3 05:13:58 UTC 2012 ************************************************************/ 13/03/26 21:49:33 INFO util.GSet: VM type = 64-bit 13/03/26 21:49:33 INFO util.GSet: 2% max memory = 17.77875 MB 13/03/26 21:49:33 INFO util.GSet: capacity = 2^21 = 2097152 entries 13/03/26 21:49:33 INFO util.GSet: recommended=2097152, actual=2097152 13/03/26 21:49:34 INFO namenode.FSNamesystem: fsOwner=rajiv 13/03/26 21:49:34 INFO namenode.FSNamesystem: supergroup=supergroup 13/03/26 21:49:34 INFO namenode.FSNamesystem: isPermissionEnabled=true 13/03/26 21:49:34 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100 13/03/26 21:49:34 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), acc essTokenLifetime=0 min(s) 13/03/26 21:49:34 INFO namenode.NameNode: Caching file names occuring more than 10 times 13/03/26 21:49:34 ERROR namenode.NameNode: java.io.IOException: Cannot create directory /u01/hadoop/tmp/dfs/na me/current at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.clearDirectory(Storage.java:297) at org.apache.hadoop.hdfs.server.namenode.FSImage.format(FSImage.java:1320) at org.apache.hadoop.hdfs.server.namenode.FSImage.format(FSImage.java:1339) at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:1164) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1271) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1288)

4) while running the basic example,

$ ./bin/hadoop jar hadoop-examples-*.jar grep input output 'dfs[a-z.]+'

13/05/07 23:04:22 INFO util.NativeCodeLoader: Loaded the native-hadoop library 13/05/07 23:04:22 WARN snappy.LoadSnappy: Snappy native library not loaded 13/05/07 23:04:22 INFO mapred.JobClient: Cleaning up the staging area file:/scratch/softwares/hadoop/tmp/mapred/staging/rajiv762105165/.staging/job_local_0001 13/05/07 23:04:22 ERROR security.UserGroupInformation: PriviledgedActionException as:rajiv cause:org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://localhost:54310/user/rajiv/input org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://localhost:54310/user/rajiv/input at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:197) at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:208) at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:989) at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:981) at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:174) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:897) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:850) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:824) at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1261) at org.apache.hadoop.examples.Grep.run(Grep.java:69) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.examples.Grep.main(Grep.java:93) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68) at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139) at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:64) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

Cause:

the input folder is not present in hdfs

org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://localhost:54310/user/rajiv/input

Solution:

./bin/hadoop dfs -ls hdfs:/user/rajiv

if input folder is not present then copy it from local file system

./bin/hadoop dfs -copyFromLocal ./input hdfs:/user/rajiv/input

above command assume that input folder is present under current dir

5) if hadoop is not started, while running example below error is displayed

bin/hadoop jar hadoop-examples-*.jar grep input output 'dfs[a-z.]+' 13/05/07 23:09:05 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:54310. Already tried 0 time(s). 13/05/07 23:09:06 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:54310. Already tried 1 time(s).

http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#Example%3A+WordCount+v1.0

Rajeeve Kuriakose's Blog

Tuesday, December 29, 2015

How to install hadoop on Linux - Single node setup (Pseudo-Distributed Mode )

This blog explains how to install and configure hadoop in Pseudo-Distributed Mode on Linux (Oracle Enterprise Linux). And common errors encountered during setup.

Install sun jdk

Download latest stable hadoop version

set JAVA_HOME for hadoop

start hadoop

run "$JAVA_HOME/bin/jps" and make sure below processes are running TaskTracker JobTracker DataNode SecondaryNameNode NameNode

Common errors and troubleshooting steps

1) If ssh is not configured, following errors will be displayed while starting hadoop

$ ./start-all.sh

To fix this make sure " core-site.xml" has below snippet

fs.default.name
hdfs://localhost:54310

Blog Archive

My Blog List

About Me

Rajeeve Kuriakose's Blog

Tuesday, December 29, 2015

How to install hadoop on Linux - Single node setup (Pseudo-Distributed Mode )

This blog explains how to install and configure hadoop in Pseudo-Distributed Mode on Linux (Oracle Enterprise Linux). And common errors encountered during setup.

Install sun jdk

Download latest stable hadoop version

set JAVA_HOME for hadoop

start hadoop

run "$JAVA_HOME/bin/jps" and make sure below processes are running TaskTracker JobTracker DataNode SecondaryNameNode NameNode

Common errors and troubleshooting steps

1) If ssh is not configured, following errors will be displayed while starting hadoop

$ ./start-all.sh

To fix this make sure " core-site.xml" has below snippet

fs.default.name hdfs://localhost:54310

Blog Archive

My Blog List

Subscribe To

About Me

fs.default.name
hdfs://localhost:54310