We will describe Hadoop setup on single node and multi node. The Hadoop environment setup and configuration will be described in details. First you need to download the following software (rpm).
- Java JDK RPM
- Apache Hadoop 0.20.204.0 RPM
A) Single node system Hadoop setup
1) Install JDK on a Red Hat or CentOS 5+ system.
$ ./jdk-6u26-linux-x64-rpm.bin.sh
Java is installed and set JAVA_HOME to /usr/java/default
2) Install Apache Hadoop 0.20.204.
$ rpm -i hadoop-0.20.204.0-1.i386.rpm
3) Setup Apache Hadoop configuration and start Hadoop processes.
$ /usr/sbin/hadoop-setup-single-node.sh
The setup wizard will guide you through a list of questions to setup Hadoop. Hadoop should be running after answering ‘Y’ to all questions.
Create a user account on HDFS for yourself.
$ /usr/sbin/hadoop-create-user.sh -u $USER
B) Multi-nodes Hadoop setup
1) Install both the JDK and Hadoop 0.20.204.0 RPMs on all nodes
2) Generate hadoop configuration on all nodes:
$ /usr/sbin/hadoop-setup-conf.sh \
--namenode-url=hdfs://${namenode}:9000/ \
--jobtracker-url=${jobtracker}:9001 \
--conf-dir=/etc/hadoop \
--hdfs-dir=/var/lib/hadoop/hdfs \
--namenode-dir=/var/lib/hadoop/hdfs/namenode \
--mapred-dir=/var/lib/hadoop/mapred \
--datanode-dir=/var/lib/hadoop/hdfs/data \
--log-dir=/var/log/hadoop \
--auto
Where ${namenode} and ${jobtracker} should be replaced with hostname of namenode and jobtracker.
3) Format namenode and setup default HDFS layout.
$ /usr/sbin/hadoop-setup-hdfs.sh
4) Start all data nodes.
$ /etc/init.d/hadoop-datanode start
5) Start job tracker node.
$ /etc/init.d/hadoop-jobtracker start
6) Start task tracker nodes.
$ /etc/init.d/hadoop-tasktracker start
7) Create a user account on HDFS for yourself.
$ /usr/sbin/hadoop-create-user.sh -u $USER
C) Setup Environment for Hadoop
$ vi ~/.bash_profile
In INSERT mode set path for JAVA_HOME
Export JAVA_HOME
Save file by clicking esc:wq
Run the .bash_profile
$ source ~/.bash_profile
Set JAVA_HOME path in Hadoop Environment file
$ /etc/hadoop/hadoop-env.sh
D) Configuration for Hadoop
Use the following:
conf/core-site.xml:
<configuration> |
<property> |
<name>fs.default.name</name> |
<value>hdfs://localhost:9000</value> |
</property> |
</configuration> |
conf/hdfs-site.xml:
<configuration> |
<property> |
<name>dfs.replication</name> |
<value>1</value> |
</property> |
</configuration> |
conf/mapred-site.xml:
<configuration> |
<property> |
<name>mapred.job.tracker</name> |
<value>localhost:9001</value> |
</property> |
</configuration> |
E) Hadoop Commands
$ hadoop
$ hadoop namenode –format (Format the namenode, If ask to
answer press ‘Y’)
$ hadoop namenode (Start the namenode)
$ find / -name start-dfs.sh (find the file in directory)
$ cd usr/sbin (Go to respective directory directly)
$ start-dfs.sh
$ start-mapred.sh
$ hadoop fs –ls / (Shows the HDFS root folder)
$ hadooop fs –put input/file01 /input/file01 (Copy local input/file01 to
HDFS root /input/file01)