Installing Hadoop, Yarn and HBase on your Linux in 10 minutes

Installing Hadoop, Yarn and HBase on your Linux in 10 minutes

1 March 2019 0 By Eric Deleforterie

This is a howto for installing Hadoop (hdfs and MapReduce), Yarn and HBase on your Linux box in 10 minutes (after the binary download).

Prerequisites

Install java (openJDK) and find the java home, on Ubuntu look at /usr/lib/jvm and choose java 1.8, not 1.11 as for the moment this will give you some troubles.

Download the Hadoop binaries here Hadoop 3.1.2 is a good start, choose the binary file.

Download the HBase binaries here HBase 2.1.3 is a good start, choose the binary file.

Installing Hadoop

We will follow the Hadoop Documentation to start a Pseudo-Distributed mode cluster

Untar the Hadoop archive in a directory, ex : /home/<username>/hadoop/hadoop-3.1.2

Edit the Hdfs configuration files

/home/<username>/hadoop/hadoop-3.1.2/etc/hadoop/core-site.xml

<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://localhost:9000</value>
    </property>
</configuration>

/home/<username>/hadoop/hadoop-3.1.2/etc/hadoop/hdfs-site.xml

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
    <property>
        <name>dfs.namenode.name.dir</name>
        <value>/home/hadoop/dfs</value>
    </property>
    <property>
        <name>dfs.datanode.data.dir</name>
        <value>/home/hadoop/data</value>
    </property>
    <property>
        <name>dfs.namenode.checkpoint.dir</name>
        <value>/home/hadoop/dfs/namesecondary</value>
    </property>    
</configuration>

/home/<username>/hadoop/hadoop-3.1.2/etc/hadoop/hadoop-env.sh

export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-amd64

You should be able to do a ssh localhost without a passphrase, if this is not the case, just use ssh-copy-id localhost command to enable this.

Initialize hdfs

With the following commands we are going to create the local directory /home/hadoop to store the files, format the namenode and start the hdfs processes.

[~]> sudo mkdir /home/hadoop
[~]> sudo chown <username>. /home/hadoop
[~]> cd /home/<username>/hadoop/hadoop-3.1.2
[hadoop-3.1.2]> bin/hdfs namenode -format
[hadoop-3.1.2]> sbin/start-dfs.sh
[hadoop-3.1.2]> jps
13579 Jps
11757 DataNode
12029 SecondaryNameNode

Now you have a HDFS system up and you could do some hdfs commands to create some directories and tests

You can look at the WebUI to check health and configuration

NameNode UI : http://localhost:9870

[hadoop-3.1.2]> bin/hdfs dfs -mkdir /user
[hadoop-3.1.2]> bin/hdfs dfs -mkdir /user/<username>
[hadoop-3.1.2]> bindfs dfs -ls /
Found 1 items
drwxr-xr-x   - <username> supergroup          0 2019-03-01 14:39 /user

Edit the Yarn configuration files

/home/<username>/hadoop/hadoop-3.1.2/etc/hadoop/mapred-site.xml

<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
    <property>
        <name>mapreduce.application.classpath</name>
        <value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*</value>
    </property>
</configuration>

/home/<username>/hadoop/hadoop-3.1.2/etc/hadoop/yarn-site.xml

<configuration>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
        <name>yarn.nodemanager.env-whitelist</name>        
        <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
    </property>
    <property>
        <name>yarn.nodemanager.local-dirs</name>
        <value>/home/hadoop/nm-local-dir/</value>
    </property>
</configuration>

Start Yarn and test the jobs

With the following commands we are going to start the Yarn Resource Manager that could run MapReduce jobs.

[hadoop-3.1.2]> sbin/start-yarn.sh
[hadoop-3.1.2]> jps
22432 NodeManager
22661 Jps
18773 NameNode
22038 ResourceManager
19017 DataNode
19307 SecondaryNameNode

Now that Yarn is running you can test the Yarn WebUI : http://localhost:8088

Let’s go to run a MapReduce job, during the execution, you can follow the job with the WebUI

[hadoop-3.1.2]> bin/hdfs dfs -mkdir input
[hadoop-3.1.2]> bin/hdfs dfs -put etc/hadoop/*.xml input
[hadoop-3.1.2]> bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.2.jar grep input output 'dfs[a-z.]+'
[hadoop-3.1.2]> bin/hdfs dfs -get output output
[hadoop-3.1.2]> cat output/*
[hadoop-3.1.2]> bin/hdfs dfs -cat output/*

From this point we have a running Hadoop system in Pseudo-Distributed mode

Installing HBase

We will follow the HBase Documentation to start a Pseudo-Distributed mode HBase

Untar the HBase archive in a directory, ex : /home/<username>/hadoop/hbase-2.1.3

Edit the HBase configuration files

/home/<username>/hadoop/hbase-2.1.3/conf/hbase-site.xml

<configuration>
<property>
    <name>hbase.cluster.distributed</name>
    <value>true</value>
</property>
<property>
    <name>hbase.rootdir</name>
    <value>hdfs://localhost:9000/hbase</value>
</property>
<property>
    <name>hbase.zookeeper.property.dataDir</name>
    <value>file:///home/hadoop/zookeeper</value>
</property>
</configuration>

/home/<username>/hadoop/hbase-2.1.3/conf/hbase-env.sh

export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-amd64

Start HBase

With the following commands we are going to start HBase.

[~]> cd /home/<username>/hadoop/hbase-2.1.3
[hbase-2.1.3]> bin/start-hbase.sh
[hbase-2.1.3]> jps
22432 NodeManager
18773 NameNode
22038 ResourceManager
25624 HRegionServer
19017 DataNode
25530 HMaster
25451 HQuorumPeer
19307 SecondaryNameNode
25949 Jps

Now you have a HBase database running with your Hadoop in Pseudo-Ditributed mode and you can check your HBase with its WebUI : http://localhost:16010

And view in Hdfs the HBase files

[hbase-2.1.3]> cd /home/<username>/hadoop/hadoop-3.1.2
[hadoop-3.1.2]> bin/hdfs dfs -ls /hbase
Found 13 items
drwxr-xr-x   - rico supergroup          0 2019-03-01 15:16 /hbase/.hbck
drwxr-xr-x   - rico supergroup          0 2019-03-01 15:17 /hbase/.tmp
drwxr-xr-x   - rico supergroup          0 2019-03-01 15:16 /hbase/MasterProcWALs
drwxr-xr-x   - rico supergroup          0 2019-03-01 15:16 /hbase/WALs
drwxr-xr-x   - rico supergroup          0 2019-03-01 15:16 /hbase/archive
drwxr-xr-x   - rico supergroup          0 2019-03-01 15:16 /hbase/corrupt
drwxr-xr-x   - rico supergroup          0 2019-03-01 15:17 /hbase/data
drwxr-xr-x   - rico supergroup          0 2019-03-01 15:17 /hbase/hbase
-rw-r--r--   3 rico supergroup         42 2019-03-01 15:16 /hbase/hbase.id
-rw-r--r--   3 rico supergroup          7 2019-03-01 15:16 /hbase/hbase.version
drwxr-xr-x   - rico supergroup          0 2019-03-01 15:16 /hbase/mobdir
drwxr-xr-x   - rico supergroup          0 2019-03-01 15:16 /hbase/oldWALs
drwx--x--x   - rico supergroup          0 2019-03-01 15:16 /hbase/staging

Testing HBase

With the following commands we are going to do the exercises from HBase Documentation.

[~]> cd /home/<username>/hadoop/hbase-2.1.3
[hbase-2.1.3]> bin/hbase shell
HBase Shell
Use "help" to get list of supported commands.
Use "exit" to quit this interactive shell.
For Reference, please visit: http://hbase.apache.org/2.0/book.html#shell
Version 2.1.3, rda5ec9e4c06c537213883cca8f3cc9a7c19daf67, Mon Feb 11 15:45:33 CST 2019
Took 0.0034 seconds                                                                                                                                                                                   
hbase(main):001:0> 
hbase(main):002:0> create 'test', 'cf'
Created table test
Took 5.3775 seconds                                                                                                                                                                                   
=> Hbase::Table - test
hbase(main):003:0> list 'test'
TABLE                                                                                                                                                                                                 
test                                                                                                                                                                                                  
1 row(s)
Took 0.0218 seconds                                                                                                                                                                                   
=> ["test"]
hbase(main):004:0> describe 'test'
Table test is ENABLED                                                                                                                                                                                 
test                                                                                                                                                                                                  
COLUMN FAMILIES DESCRIPTION                                                                                                                                                                           
{NAME => 'cf', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING => 'NONE', TTL =
> 'FOREVER', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 'false', IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 
'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '65536'}                                                                                                                           
1 row(s)
Took 0.2403 seconds                                                                                                                                                                                   
hbase(main):005:0> put 'test', 'row1', 'cf:a', 'value1'
Took 0.0952 seconds                                                                                                                                                                                   
hbase(main):006:0> put 'test', 'row2', 'cf:b', 'value2'
Took 0.0048 seconds                                                                                                                                                                                   
hbase(main):007:0> put 'test', 'row3', 'cf:c', 'value3'
Took 0.0048 seconds                                                                                                                                                                                   
hbase(main):008:0> scan 'test'
ROW                                                COLUMN+CELL                                                                                                                                        
 row1                                              column=cf:a, timestamp=1551450920340, value=value1                                                                                                 
 row2                                              column=cf:b, timestamp=1551450930781, value=value2                                                                                                 
 row3                                              column=cf:c, timestamp=1551450938910, value=value3                                                                                                 
3 row(s)
Took 0.0247 seconds

Stop all

With the following commands we are going to stop HBase, Yarn and Hdfs.

[~]> cd /home/<username>/hadoop/hbase-2.1.3
[hbase-2.1.3]> bin/stop-hbase.sh
[hbase-2.1.3]> cd /home/<username>/hadoop/hadoop-3.1.2
[hadoop-3.1.2]> sbin/stop-yarn.sh
[hadoop-3.1.2]> sbin/stop-dfs.sh
[hadoop-3.1.2]> jps
25949 Jps

Hope this howto will help you to discover the Hadoop and HBase technologies.

Please follow and like us: