Installing Hadoop, Yarn and HBase on your Linux in 10 minutes
This is a howto for installing Hadoop (hdfs and MapReduce), Yarn and HBase on your Linux box in 10 minutes (after the binary download).
Prerequisites
Install java (openJDK) and find the java home, on Ubuntu look at /usr/lib/jvm and choose java 1.8, not 1.11 as for the moment this will give you some troubles.
Download the Hadoop binaries here Hadoop 3.1.2 is a good start, choose the binary file.
Download the HBase binaries here HBase 2.1.3 is a good start, choose the binary file.
Installing Hadoop
We will follow the Hadoop Documentation to start a Pseudo-Distributed mode cluster
Untar the Hadoop archive in a directory, ex : /home/<username>/hadoop/hadoop-3.1.2
Edit the Hdfs configuration files
/home/<username>/hadoop/hadoop-3.1.2/etc/hadoop/core-site.xml
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://localhost:9000</value> </property> </configuration>
/home/<username>/hadoop/hadoop-3.1.2/etc/hadoop/hdfs-site.xml
<configuration> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>/home/hadoop/dfs</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>/home/hadoop/data</value> </property> <property> <name>dfs.namenode.checkpoint.dir</name> <value>/home/hadoop/dfs/namesecondary</value> </property> </configuration>
/home/<username>/hadoop/hadoop-3.1.2/etc/hadoop/hadoop-env.sh
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-amd64
You should be able to do a ssh localhost without a passphrase, if this is not the case, just use ssh-copy-id localhost command to enable this.
Initialize hdfs
With the following commands we are going to create the local directory /home/hadoop to store the files, format the namenode and start the hdfs processes.
[~]> sudo mkdir /home/hadoop [~]> sudo chown <username>. /home/hadoop [~]> cd /home/<username>/hadoop/hadoop-3.1.2 [hadoop-3.1.2]> bin/hdfs namenode -format [hadoop-3.1.2]> sbin/start-dfs.sh [hadoop-3.1.2]> jps 13579 Jps 11757 DataNode 12029 SecondaryNameNode
Now you have a HDFS system up and you could do some hdfs commands to create some directories and tests
You can look at the WebUI to check health and configuration
NameNode UI : http://localhost:9870
[hadoop-3.1.2]> bin/hdfs dfs -mkdir /user [hadoop-3.1.2]> bin/hdfs dfs -mkdir /user/<username> [hadoop-3.1.2]> bindfs dfs -ls / Found 1 items drwxr-xr-x - <username> supergroup 0 2019-03-01 14:39 /user
Edit the Yarn configuration files
/home/<username>/hadoop/hadoop-3.1.2/etc/hadoop/mapred-site.xml
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapreduce.application.classpath</name> <value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*</value> </property> </configuration>
/home/<username>/hadoop/hadoop-3.1.2/etc/hadoop/yarn-site.xml
<configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.env-whitelist</name> <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value> </property> <property> <name>yarn.nodemanager.local-dirs</name> <value>/home/hadoop/nm-local-dir/</value> </property> </configuration>
Start Yarn and test the jobs
With the following commands we are going to start the Yarn Resource Manager that could run MapReduce jobs.
[hadoop-3.1.2]> sbin/start-yarn.sh [hadoop-3.1.2]> jps 22432 NodeManager 22661 Jps 18773 NameNode 22038 ResourceManager 19017 DataNode 19307 SecondaryNameNode
Now that Yarn is running you can test the Yarn WebUI : http://localhost:8088
Let’s go to run a MapReduce job, during the execution, you can follow the job with the WebUI
[hadoop-3.1.2]> bin/hdfs dfs -mkdir input [hadoop-3.1.2]> bin/hdfs dfs -put etc/hadoop/*.xml input [hadoop-3.1.2]> bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.2.jar grep input output 'dfs[a-z.]+' [hadoop-3.1.2]> bin/hdfs dfs -get output output [hadoop-3.1.2]> cat output/* [hadoop-3.1.2]> bin/hdfs dfs -cat output/*
From this point we have a running Hadoop system in Pseudo-Distributed mode
Installing HBase
We will follow the HBase Documentation to start a Pseudo-Distributed mode HBase
Untar the HBase archive in a directory, ex : /home/<username>/hadoop/hbase-2.1.3
Edit the HBase configuration files
/home/<username>/hadoop/hbase-2.1.3/conf/hbase-site.xml
<configuration> <property> <name>hbase.cluster.distributed</name> <value>true</value> </property> <property> <name>hbase.rootdir</name> <value>hdfs://localhost:9000/hbase</value> </property> <property> <name>hbase.zookeeper.property.dataDir</name> <value>file:///home/hadoop/zookeeper</value> </property> </configuration>
/home/<username>/hadoop/hbase-2.1.3/conf/hbase-env.sh
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-amd64
Start HBase
With the following commands we are going to start HBase.
[~]> cd /home/<username>/hadoop/hbase-2.1.3 [hbase-2.1.3]> bin/start-hbase.sh [hbase-2.1.3]> jps 22432 NodeManager 18773 NameNode 22038 ResourceManager 25624 HRegionServer 19017 DataNode 25530 HMaster 25451 HQuorumPeer 19307 SecondaryNameNode 25949 Jps
Now you have a HBase database running with your Hadoop in Pseudo-Ditributed mode and you can check your HBase with its WebUI : http://localhost:16010
And view in Hdfs the HBase files
[hbase-2.1.3]> cd /home/<username>/hadoop/hadoop-3.1.2 [hadoop-3.1.2]> bin/hdfs dfs -ls /hbase Found 13 items drwxr-xr-x - rico supergroup 0 2019-03-01 15:16 /hbase/.hbck drwxr-xr-x - rico supergroup 0 2019-03-01 15:17 /hbase/.tmp drwxr-xr-x - rico supergroup 0 2019-03-01 15:16 /hbase/MasterProcWALs drwxr-xr-x - rico supergroup 0 2019-03-01 15:16 /hbase/WALs drwxr-xr-x - rico supergroup 0 2019-03-01 15:16 /hbase/archive drwxr-xr-x - rico supergroup 0 2019-03-01 15:16 /hbase/corrupt drwxr-xr-x - rico supergroup 0 2019-03-01 15:17 /hbase/data drwxr-xr-x - rico supergroup 0 2019-03-01 15:17 /hbase/hbase -rw-r--r-- 3 rico supergroup 42 2019-03-01 15:16 /hbase/hbase.id -rw-r--r-- 3 rico supergroup 7 2019-03-01 15:16 /hbase/hbase.version drwxr-xr-x - rico supergroup 0 2019-03-01 15:16 /hbase/mobdir drwxr-xr-x - rico supergroup 0 2019-03-01 15:16 /hbase/oldWALs drwx--x--x - rico supergroup 0 2019-03-01 15:16 /hbase/staging
Testing HBase
With the following commands we are going to do the exercises from HBase Documentation.
[~]> cd /home/<username>/hadoop/hbase-2.1.3 [hbase-2.1.3]> bin/hbase shell HBase Shell Use "help" to get list of supported commands. Use "exit" to quit this interactive shell. For Reference, please visit: http://hbase.apache.org/2.0/book.html#shell Version 2.1.3, rda5ec9e4c06c537213883cca8f3cc9a7c19daf67, Mon Feb 11 15:45:33 CST 2019 Took 0.0034 seconds hbase(main):001:0> hbase(main):002:0> create 'test', 'cf' Created table test Took 5.3775 seconds => Hbase::Table - test hbase(main):003:0> list 'test' TABLE test 1 row(s) Took 0.0218 seconds => ["test"] hbase(main):004:0> describe 'test' Table test is ENABLED test COLUMN FAMILIES DESCRIPTION {NAME => 'cf', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING => 'NONE', TTL = > 'FOREVER', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 'false', IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '65536'} 1 row(s) Took 0.2403 seconds hbase(main):005:0> put 'test', 'row1', 'cf:a', 'value1' Took 0.0952 seconds hbase(main):006:0> put 'test', 'row2', 'cf:b', 'value2' Took 0.0048 seconds hbase(main):007:0> put 'test', 'row3', 'cf:c', 'value3' Took 0.0048 seconds hbase(main):008:0> scan 'test' ROW COLUMN+CELL row1 column=cf:a, timestamp=1551450920340, value=value1 row2 column=cf:b, timestamp=1551450930781, value=value2 row3 column=cf:c, timestamp=1551450938910, value=value3 3 row(s) Took 0.0247 seconds
Stop all
With the following commands we are going to stop HBase, Yarn and Hdfs.
[~]> cd /home/<username>/hadoop/hbase-2.1.3 [hbase-2.1.3]> bin/stop-hbase.sh [hbase-2.1.3]> cd /home/<username>/hadoop/hadoop-3.1.2 [hadoop-3.1.2]> sbin/stop-yarn.sh [hadoop-3.1.2]> sbin/stop-dfs.sh [hadoop-3.1.2]> jps 25949 Jps
Hope this howto will help you to discover the Hadoop and HBase technologies.