This tutorial will show how to install and configure version 5.7.0 of Cloudera Distribution Hadoop (CDH 5) on Ubuntu 16.04 host using Docker.

What’s CDH?

CDH (Cloudera’s Distribution Including Apache Hadoop) is the most complete, tested, and widely deployed distribution of Apache Hadoop. CDH is 100% open source and is the only Hadoop solution to offer batch processing, interactive SQL and interactive search as well as enterprise-grade continuous availability. More enterprises have downloaded CDH than all other distributions combined.

Why Docker?

Getting down to the nuts and bolts, Docker allows applications to be isolated into containers with instructions for exactly what they need to survive that can be easily ported from machine to machine. Virtual machines also allow the exact same thing. While Docker has a more simplified structure compared to both of these, the real area where it causes disruption is resource efficiency.

If you want to make free money and have a blog like this one using our platform then sign up with this referral link of digital ocean platform if you don’t like money forget it, my friend.

Install Docker

Installing docker is very easy. The choice here is Ubuntu 16.04, so before start with the installation takes into consideration the requirements then follow this guide.

Uninstall old versions

Older versions of Docker were called docker or docker-engine. If these are installed, uninstall them:

$ sudo apt-get remove docker docker-engine docker.io

The contents of /var/lib/docker, including images, containers, volumes, and networks, are preserved. Check the content using these commands below.

$ sudo ls /var/lib/docker
Contents of a /var/lib/docker.

Install using the repository

Set up the repo

Update the apt package index, install packages to allow apt to use a repository over HTTPS, and add Docker’s official GPG key:

$ sudo apt-get update
$ sudo apt-get install 
    apt-transport-https 
    ca-certificates 
    curl 
    software-properties-common
$ curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -

Verify that you now have the key with the fingerprint 9DC8 5822 9FC7 DD38 854A E2D8 8D81 803C 0EBF CD88, by searching for the last 8 characters of the fingerprint.

$ sudo apt-key fingerprint 0EBFCD88
Searching for the last 8 characters of the fingerprint.

Use the following command to set up the stable repository. You always need the stable repository, even if you want to install builds from the edge or test repositories as well.

$ sudo add-apt-repository 
   "deb [arch=amd64] https://download.docker.com/linux/ubuntu 
   $(lsb_release -cs) 
   stable"

Install Docker CE

Update the apt package index, list the available versions in the repo, then select and install a version of Docker CE:

$ sudo apt-get update
$ apt-cache madison docker-ce
Choose a specific version of Docker CE.

Install a specific version by its fully qualified package name, for example, docker-ce=5:18.09.0~3-0~ubuntu-xenial, and verify that Docker CE is installed correctly by running the hello-world image.

$ sudo apt-get install docker-ce=5:18.09.0~3-0~ubuntu-xenial docker run hello-world

The last command downloads a test image and runs it in a container. When the container runs, it prints an informational message and exits.

Informational message to verify Docker installation.

Importing the Cloudera QuickStart Image

Before importing the image assure Docker is running, and type this into the terminal in the home directory /home/your_name. This will take a couple of minutes to complete because it’s a large file size so you can take a cup of tea or whatever you like.

$ docker pull cloudera/quickstart:latest

The next image is to check if everything working fine.

Information message when downloading the image.

Running a Cloudera QuickStart Container

To run a container using the image, you must know the name or hash of the image. If you followed the instructions above, the name could be cloudera/quickstart: latest. The hash is also printed in the terminal when you import, or you can look up the hashes of all imported images with:

$ docker images

Once you know the name or hash of the image, you can run it:

docker run -m 4G --memory-reservation 2G --memory-swap 8G --hostname=quickstart.cloudera --privileged=true -t -i -v $(pwd):/zaid --publish-all=true -p8888 -p8088 cloudera/quickstart /usr/bin/docker-quickstart

Basically, this command is telling docker to run an image with 4GByte the maximum amount of memory the container can use, with 2GByte as soft limit smaller than 4GByte which is activated when Docker detects contention or low memory on the host machine, and 8GByte the amount of memory this container is allowed to swap to disk. Privileged mode is required for HBase database, with option -i means interactive, option -t means to open it in the terminal, and option -v allows to share volumes with the container, so anything that we put in the home directory, will show up in the Docker container under the directory /zaid . We have to change this to the directory of our files. The option — publish-all=true opens up all the host ports to the docker ports, so you can access programs like the Hue in the port 8888 and YARN in the port 8088, and others programs.

Using the command below we can check if the deployment of the image working smoothly.

docker ps -a
Information message about container Id.

Getting HUE and YARN to work

We need to check if Hue and YARN are working in our docker machine, so we take the container Id from the information generated by the last command and we utilize these Id with the docker inspect command.

docker inspect [CONTAINER ID]
Network settings of docker image.

From the last image, we take into account that Hue is working on the port 8888 inside the docker machine, 32768 outside the docker machine which means on our localhost, and YARN 8088 inside, 32769 outside.

These images below are our prove that HUE and YARN working as expected, so we put this line localhost:32768 in our browser for HUE and localhost: 32769 for YARN.

HUE

Quick start page of HUE.

YARN

YARN all applications page.

Collect system information

Docker CE defaults values is to use of the system’s memory. So the minimum you should use is 4GB, The laptop for this guide only has 8GB, so we allocate 4GB to docker when its running.

Using the command below we can check that the laptop memory is 8031140 kB.

sudo cat /proc/meminfo
Information about my laptop memory.

And to see if we are running Docker CE with the minimum configuration we use this command.

docker stats [CONTAINER ID]
Live stream of container statistics.

Once we used the last command it showed a live stream of statistics when the memory usage of Docker image is between 1.77 GBytes and 4GBytes.

Thanks for reading. If you loved this article, feel free to hit that subscribe button so we can stay in touch.