Jan 14, 2011

Hadoop EC2 scripts broken on Windows / cygwin

Grrr... As of 0.21.0, the list-hadoop-clusters and delete-hadoop-cluster scripts need the rev utility, which is neither installed by default nor available as an add-on (unless I'm braindead, which is a definite possibility).

Quickfix : use tac instead

$ ln -s /bin/tac /bin/rev

Fugly, but now the scripts work...

Jan 10, 2011

Installing Hadoop on Windows / cygwin

Highly unnecessary... unless you're stuck with a Windows machine :-/

1) Install Cygwin

This is as straightforward as it gets, but don't forget to add your favorite editor : vim is not included in the default install (!).

The default directory is c:\cygwin, no reason to change it.

2) Install the Java Development Kit

You should avoid unnecessary spaces in the installation directory : c:\jdk1.6 will do nicely.

3) Install Hadoop

Download the latest release (0.21.0 at the time of writing) and extract it to d:\work (or something similar).

4) Fix your environment variables

Start cygwin and append the following lines at the end of your .bashrc file:

$ export JAVA_HOME=/cygdrive/c/jdk1.6
$ export HADOOP_INSTALL=/cygdrive/d/work/hadoop-0.21.0
$ export PATH=$PATH:$HADOOP_INSTALL/bin


5) Fix the hadoop-config script

$ vi $HADOOP_INSTALL/bin/hadoop-config.sh

Locate this section starting with "# cygwin path translation" and add the following line :

CLASSPATH=`cygpath -wp "$CLASSPATH"`

Save and exit.

6) Test your installation

$ hadoop version
Hadoop 0.21.0
etc etc.


That's it. Happy hadoop'ing :)