Hadoop installation modes – Let’s explore

Hadoop-Modus

Übersicht: Apache Hadoop kann in verschiedenen Modi wie pro die Anforderung installiert werden. Diese unterschiedlichen Modi werden während der Installation konfiguriert. In der Standardeinstellung, Hadoop ist installiert in Standalone Modus. Die anderen Modi sind Pseudo verteilt Modus und verteilt Modus. Der Zweck dieser Übung ist unterschiedliche Installationsarten auf einfache Weise zu erklären, so dass die Leser es folgen können und ihre eigene Arbeit zu tun.

In this article, Ich werde verschiedene Installationsarten zu diskutieren und ihre Details.

Einführung: Wir alle wissen, dass Apache Hadoop ist ein Open-Source-Framework, die sich über verschiedene Cluster mit einfachen Programmierung verteilte Verarbeitung großer Datenmengen festlegen können. Hadoop hat die Fähigkeit, von einem einzigen Server zu Tausenden von Computern zu skalieren. So unter diesen Bedingungen die Installation von Hadoop wird kritischste. Wir können Hadoop installieren in drei verschiedenen Modi –

Standalone-Modus – Single Node-Cluster
Pseudo verteilten Modus – Single Node-Cluster
Distributed-Modus. – Multi-Node-Cluster

Purpose of different installation modes: When Apache Hadoop is used in a production environment, multiple server nodes are used for distributed computing. But for understanding the basics and playing around with Hadoop, single node installation is sufficient. There is another mode known as ‘pseudo distributed’ Modus. This mode is used to simulate the multi node environment on a single server.

In this document we will discuss how to install Hadoop on Ubunto Linux. Be it any mode, the system should have java version 1.6.x installed on it.

Standalone mode installation: Now, let us check the standalone mode installation process by following the steps mentioned below.

Install Java –
Java (JDK Version 1.6.x) either from Sun/Oracle or Open Java is required.

Step 1 – Wenn Sie nicht stattdessen zu OpenJDK zu wechseln können proprietäre Sun JDK / JRE der Verwendung, sun-java6 von Canonical Partner Repository installieren, indem Sie den folgenden Befehl eingeben.

Beachten: Die Canonical Partner Repository kostenlos enthält Kosten Closed-Source-Software von Drittanbietern. Aber die Canonical hat keinen Zugriff auf den Quellcode, anstatt nur sie verpacken und zu testen.

Fügen Sie den kanonischen Partner der apt-Repositories mit –

[Code]

$ sudo add-apt-repository “deb http://archive.canonical.com/lucid Partner”

[/Code]

Step 2 – Aktualisieren Sie die Quellenliste.

[Code]

$ sudo apt-get update

[/Code]

Step 3 – Installieren Sie JDK Version 1.6.x von Sun / Oracle.

[Code]

$ sudo apt-get install sun-java6-jdk

[/Code]

Step 4 – Sobald JDK-Installation machen über Sie sicher, dass es richtig eingestellt ist mit – Version 1.6.x von Sun / Oracle.

[Code]

user @ ubuntu:~ # Java -version Java-Version “1.6.0_45” Java(TM) SE Runtime Environment (bauen 1.6.0_45-b02) Java HotSpot(TM) Client VM (bauen 16,4-b01, mixed mode, sharing)

[/Code]

In Hadoop Benutzer

Step 5 – Fügen Sie einen dedizierten Hadoop Unix-Benutzer in Ihrem System wie unter dieser Installation von anderen Software zu isolieren –

[Code]

$ sudo adduser hadoop_admin

[/Code]

Laden Sie die Hadoop Binär- und installieren

Step 6 – Download von Apache Hadoop von der Apache-Website. Hadoop kommt in Form von Teer-gx Format. Kopieren Sie diesen binären in das Verzeichnis / usr / local / installables Ordner. Der Ordner – installables sollte zuerst unter / usr / local vor diesem Schritt erstellt werden. Führen Sie nun die folgenden Befehle als sudo

[Code]

$ cd / usr / local / installables $ sudo tar xzf hadoop-0.20.2.tar.gz $ sudo chown -R hadoop_admin / usr / local / hadoop-0.20.2

[/Code]

Definieren env Variable – JAVA_HOME

Step 7 – Öffnen Sie die Hadoop-Konfigurationsdatei (hadoop-env.sh) in der Lage – /usr / local / installables / hadoop-0.20.2/conf / hadoop-env.sh und die JAVA_HOME wie unter definieren -

[Code] JAVA_HOME Export = path / wo / jdk / ist / installiert [/Code]

(z.B.. /usr / bin / java)

Die Installation im Single-Modus

Step 8 – Now go to the HADOOP_HOME directory (location where HADOOP is extracted) and run the following command –

[Code]

$ bin/hadoop

[/Code]

The following output will be displayed –

[Code] Verwendung: hadoop [–config confdir] COMMAND

[/Code]

Some of the COMMAND options are mentioned below. There are other options available and can be checked using the command mentioned above.

[Code] NameNode -Format formatiert die DFS-Dateisystem secondarynamenode die DFS sekundäre NameNode laufen NameNode die DFS NameNode DataNode laufen ein DFS-DataNode dfsadmin ein DFS-Client-Server-Betreiber mradmin laufen ein Map-Reduce Admin-Client fsck laufen eine DFS-Dateisystem überprüfen Dienstprogramm ausführen laufen

[/Code]

Die obige Ausgabe zeigt an, dass Standalone-Installation erfolgreich abgeschlossen ist. Jetzt können Sie die Beispiel Beispiele Ihrer Wahl durch den Aufruf ausführen -

[Code] $ bin / hadoop jar hadoop - * - examples.jar <NAME> <PARAMS>[/Code]

Pseudo Installation im verteilten Modus: Dies ist eine simulierte Mehr Umgebung Knoten auf einem einzelnen Knoten Server-basierte.
Hier ist der erste Schritt erforderlich ist, den SSH, um den Zugang zu konfigurieren und die verschiedenen Knoten verwalten. So ist es zwingend erforderlich, den SSH-Zugriff auf die verschiedenen Knoten zu haben,. Sobald die Konfiguration von SSH, aktiviert und ist zugänglich sollten wir die Hadoop Konfiguration starten. Folgende Konfiguration muss Dateien geändert werden -

conf / Kern-site.xml
conf / HDFS-site.xml
conf / mapred.xml

Öffnen Sie die alle Konfigurationsdateien in wir Editor und aktualisieren Sie die Konfiguration.

Konfigurieren Kern-Datei site.xml:

[Code]$ vi conf / Kern-site.xml[/Code] [Code]<Konfiguration><Eigentum><Name>fs.default.name</Name><Wert>HDFS://localhost:9000</Wert></Eigentum><Eigentum><Name>hadoop.tmp.dir</Name><Wert>/tmp / hadoop- ${user.name}</Wert></Eigentum></Konfiguration>[/Code]

Konfigurieren Sie hdfs-Datei site.xml:

[Code]$ vi conf / hdfs-site.xml[/Code] [Code]<Konfiguration><Eigentum><Name>dfs.replication</Name><Wert>1</Wert></Eigentum></Konfiguration>[/Code]

Konfigurieren mapred.xml Datei:

[Code]$ vi conf / mapred.xml[/Code] [Code]<Konfiguration><Eigentum><Name>mapred.job.tracker</Name> <Wert>localhost:9001</Wert></Eigentum></Konfiguration>[/Code] Once these changes are done, we need to format the name node by using the following command. The command prompt will show all the messages one after another and finally success message. [Code]$ bin/hadoop namenode –format[/Code] Now our setup is done for pseudo distributed node. Let’s now start the single node cluster by using the following command. It will again show some set of messages on the command prompt and start the server process. [Code]$ /bin/start-all.sh[Code] Now we should check the status of Hadoop process by executing the jps command as shown below. It will show all the running processes. [Code]$ jps 14799 NameNode14977 SecondaryNameNode 15183 DataNode15596 JobTracker15897 TaskTracker[/Code]

Stopping the Single node Cluster: We can stop the single node cluster by using the following command. The command prompt will display all the stopping processes.

[Code]$ bin/stop-all.sh stopping jobtrackerlocalhost: stopping tasktrackerstopping namenodelocalhost: stopping datanodelocalhost: stopping secondarynamenode[/Code]

Distributed mode installation:
Before we start the distributed mode installation, we must ensure that we have the pseudo distributed setup done and we have at least two machines, one acting as master and the other one acting as a slave. Now we run the following commands in sequence.

· $ bin/stop-all.sh – Make sure none of the nodes are running

Open the /etc/hosts file and add the following entries for master and slave –

<IP ADDRESS> master

<IP ADDRESS> slave

$ ssh-copy-id -i $HOME/.ssh/id_rsa.pub slave – This command should be executed on master to have the passwordless ssh. We should login using the same username on all the machines. If we need a password, we can set it manually.
Now we open the two files – conf/master and conf/slaves. The conf/master defines the name nodes of our multi node cluster. The conf/slaves file lists the hosts where the Hadoop Slave will be running.
Edit the conf/core-site.xml file to have the following entries –

<Name>fs.default.name</Name>

<Wert>HDFS://master:54310</Wert>

</Eigentum>

Edit the conf/mapred-site.xml file to have the following entries –

<Name>mapred.job.tracker</Name>

<Wert>HDFS://master:54311</Wert>

</Eigentum>

Edit the conf/hdfs-site.xml file to have the following entries –

<Name>dfs.replication</Name>

</Eigentum>

Edit the conf/mapred-site.xml file to have the following entries –

<Name>mapred.local.dir</Name>

<Wert>${hadoop-tmp}/mapred/local</Wert>

</Eigentum>

<Name>mapred.map.tasks</Name>

</Eigentum>

<Name>mapred.reduce.tasks</Name>

</Eigentum>

Now start the master by using the following command.

[Code] bin/start-dfs.sh [/Code]

Once started, check the status on the master by using jps command. You should get the following output –

[Code]

14799 NameNode

15314 Jps
16977 secondaryNameNode

[/Code]

On the slave the output should be as shown below.

[Code]

15183 DataNode
15616 Jps

[/Code]

Now start the MapReduce daemons by using the following command.

[Code]

$ bin/start-mapred.sh

[/Code]

Once started check the status on the master by using jps command. You should get the following output –

[Code]

16017 Jps

14799 NameNode

15596 JobTracker

14977 SecondaryNameNode

[/Code]

And on the slaves the output should be as shown below.

[Code]

15183 DataNode

15897 TaskTracker
16284 Jps

[/Code]

Summary: In the above discussion we have covered different Hadoop installation modes and their technical details. But we should be careful when selecting the installation mode. Different modes have their own purpose. So the beginners should start with single mode installation and then proceed with other options.
Let us summarize our discussion with the following bullets

Apache Hadoop can be installed in three different modes –
- Single node
- Pseudo distributed node
- Distributed node
Single mode is the simplest way to install and get started.
If we need clusters but have only one node available, then we should go for Pseudo distributed mode
To install the distributed mode we should have the pseudo distributed mode installed first.

Share on Facebook

Save

Stichworte:Hadoop

TechAlpine – All About Technology

www.techalpine.com

Hadoop installation modes – Let’s explore

Enjoy this blog? Please spread the word :)