Installing Hadoop 2.X on Fedora Server 24

Now that we have a fresh installation of Fedora Server 24 and our virtual machine is LAN-available the next step is install Apache Hadoop in our virtual machine.

In this case, as a requirement of @yulwitter we will working with Hadoop 2.7.1 although the installation steps are essentially the same for 2.X versions of Hadoop.

Setting up our environment

The first thing that we need to do is start our virtual machine and connect to it through SSH

$ ssh hpc@192.168.1.20

The IP used is the same that we set in the video of this post . Once we are in, we will update our system to have the most recent version of installed packages.

$ sudo dnf update

In order to the updates to take effect we need restart the system

$ sudo shutdown -r now

And then connect again though SSH. Now we are ready to start the installation of Hadoop dependencies to build it from its sources.

Installing Hadoop dependencies

To install all necessary dependencies to build Hadoop 2.7.1 succesfuly, run the following commands in your terminal

$ sudo dnf group install "Development Tools" "Development Libraries"
$ sudo dnf install gcc-c++ cmake java-1.8.0-openjdk maven libtool zlib-devel
$ sudo dnf install fuse-devel snappy-devel jansson-devel

You also need to install protobuf, installing it using dnf will install version 2.6.1 and Hadoop requires version 2.5. To install protobuf properly we need to download it from Google’s Github account.

$ mkdir ~/downloads
$ cd ~/downloads
$ wget -c https://github.com/google/protobuf/releases/download/v2.5.0/protobuf-2.5.0.tar.gz

Once the file is downloaded, we will decompress, configure and build protobuf.

$ tar xzvf protobuf-2.5.0.tar.gz
$ cd protobuf-2.5.0
$ ./autogen.sh
$ configure --prefix=/opt/protobuf
$ make
$ make check
$ sudo make install

Now that protobuf is installed we need to add its binaries to the PATH environment variable. To do this you need to add the following line to your terminal profile file (~/.bashrc, ~/.profile or ~/.zshrc)

export PROTOBUF_PREFIX="/opt/protobuff"
export PATH="${PATH}:${PROTOBUF_PREFIX}/bin"

We also need to set the JAVA_HOME, this can be done adding the following line to your terminal profile file

export JAVA_HOME="$(dirname $(dirname $(readlink -f $(which javac))))"

Building Hadoop from its sources

Now, that we have installed all Hadoop’s dependencies, we are ready to build Hadoop. First we need to get the URL to download sources. Go to
http://hadoop.apache.org/ and then just follow this images.

Screenshot from 2016-07-20 23-58-07Screenshot from 2016-07-20 23-58-28Screenshot from 2016-07-20 23-58-39
Figure 1

Download the file and decompress

$ cd ~/downloads
$ wget -C http://supergsego.com/apache/hadoop/common/hadoop-2.7.1/hadoop-2.7.1-src.tar.gz
$ tar xzvf hadoop-2.7.1-src.tar.gz
$ cd hadoop-2.7.1-src

To know how to build and know some details about installation is a good idea read the file BUILDING.txt. In our case we will build only binaries and native components running the commnand

$ export MAVEN_OPTS="-Xms256m -Xmx512m"; mvn package -Pdist,native -DskipTests -Dtar

The part export MAVEN_OPTS="-Xms256m -Xmx512m" is to avoid memory problems with Java.

After run the command above, is time to wait a long time to complete the build of Hadoop.

Installing Hadoop binaries

When the building process finishes, all the binary distribution is stored at hadoop-dist/target/hadoop-2.7.1. All that we need to do is move this directory to our preffered location for external programs and add its bin and sbin directories to the PATH environment variable.

In my particular case I will move this directory to opt

$ cd ~/downloads/hadoop-2.7.1-src/hadoop-dist/target
$ sudo cp -r hadoop-2.7.1 /opt/hadoop

And the last lines of my terminal profile file (~/.bashrc, ~/.profile or ~/.zshrc) will look like this

export JAVA_HOME="$(dirname $(dirname $(readlink -f $(which javac))))"

export PROTOBUF_PREFIX="/opt/protobuff"
export HADOOP_PREFIX="/opt/hadoop"

export PATH="${PATH}:${HADOOP_PREFIX}/bin:${HADOOP_PREFIX}/sbin:${PROTOBUF_PREFIX}/bin"

Housekeeping

Now is time to clean our environment to save space in our virtual machine. We neer the files downloaded by us and opt.

$ rm -rf ~/dowloads/*
$ rm -rf ~/.m2

And now we have our virtual machine clean and with a clean installation of Hadoop.

Advertisements

2 thoughts on “Installing Hadoop 2.X on Fedora Server 24

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s