There are many ways how Docker can be used. In this blog post I’ll summarize the steps I did to create a running Hadoop Docker image for the Cloudera Version (CDH5) of Hadoop MapReduce MRv1 (the “old” MapReduce) and MRv2 (the “new” MapReduce aka YARN). Before we start, please check if Docker is installed on your local OS.
Ready-to-use image for MapReduce v1 and MapReduce v2:
To keep it simple, you can of course use one of my previously build Hadoop Docker images, execute Hadoop MapReduce jobs and skip the remaining steps.
MapReduce v2 - YARN Architecture
Step by step tutorial
If you want to start from scratch, we need a basic OS image we can work with. For that,
pull a fresh Docker Ubuntu image (14.04) and
run it. The run command starts a new container which is a running instance of the Ubuntu image.
Now, back on your local OS (type
exit to close the Ubuntu container from before) create a new folder including an empty file named
Dockerfile. The Dockerfile should include all necessary commands to build the new image. Have a look at my Github repository including a Dockerfile.
When you are satisfied with your Dockerfile you are ready to build your first Docker image. Just execute the following commands on your OS where the Dockerfile is located. Keep in mind that every time your Dockerfile has been changed, a rerun of the
build command is required.
You should now be able to connect to http://localhost:50030 from your local OS and execute a MapReduce job on the command line.
This is all for today. In the next blog post, I’ll show how our Hadoop exectution platform Cloudgene uses such an image for an easy installation and execution process.