Preface

Apache Hadoop is an open-source software for reliable and scalable distributed computing. It provides a framework for distributed processing of large data sets across clusters of computers. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. It is designed to detect and handle failures at the application layer, so it is easy to delivering a highly-available service on top of a cluster of computers.

Setting up a Single Node Cluster

The following steps tells you how to set up a single-node Hadoop installation so that you can quickly perform simple operations using Hadoop MapReduce and the Hadoop Distributed File System (HDFS).

Build and Install Hadoop 2.x or newer on Windows

Hadoop is compatible with Windows Server 2008, Windows Server 2008, Windows Vista and Windows 7. Windows XP and Cygwin are not supported

Set JAVA_HOME

Hadoop was tested on JDK 1.6 and 1.7. Make sure that JAVA_HOME is set in your environment and that it does not contain any spaces. If it does, then you must use the Windows 8.3 Pathname instead.

e.g. use c:\Progra~1\Java\... instead of c:\Program Files\Java\....

Download the sources

Download the latest stable release sources from one of the following

ASF Hadoop download page
Subversion URL: https://svn.apache.org/repos/asf/hadoop/common/branches/branch-2.5
Git repository URL: git://git.apache.org/hadoop-common.git.
After downloading via git, switch to the desired branch using git checkout.

Installing Dependencies and Setting up Environment for Building

The BUILDING.txt file in the root of the source tree has detailed information on the list of requirements and how to install them. It also includes information on setting up the environment and a few quirks that are specific to Windows. It is strongly recommended that you read and understand it before proceeding.

Hadoop Tutorial - on-going

Thursday, 23 October 2014

Hadoop Tutorial