[Updated 12.19.12: We are now using Mongodb for Gander. Still, we spent several months getting decently cozy with HBase, so if you have any questions, feel free to ask in the comments or on twitter!]

I recently needed to add Apache HBase to my pseudo-distributed installation of Apache Hadoop. Though in the process of installing it, I hit a number of obstacles which were poorly documented. I ultimately got it working, and here's what I did:

Download

First of all, I downloaded hbase 0.90.4 from http://www.apache.org/dyn/closer.cgi/hbase/. Specifically, I got "hbase-0.90.4.tar.gz". Later versions may work, but I have some third party tools that won't work on later versions.

Install

I unzipped the contents of the package and moved it into /usr/local/hbase. So the directory structure is like:

 /usr/local/hbase/
     bin/
     conf/
     lib/
     ...

Configure Environment

I exported some variables in my ~/.bashrc

export HBASE_HOME=/usr/local/hbase

I believe this one is required usually.

export PATH=${PATH}:${HBASE_HOME}/bin

This one isn't strictly necessary, but it puts hbase on the path as well as some other shell scripts they supply. You could also make a symlink to "/usr/local/hbase/bin/hbase" and put that in /usr/local/bin or something.

Configure

Next, I set up the hbase adding the following to these files:

/usr/local/hbase/conf/hbase-env.sh
export JAVA_HOME=/usr/lib/jvm/java-6-openjdk
export HBASE_LOG_DIR=/tmp/hbase/logs

Your JAVA_HOME may be different. And of course, if you already have JAVA_HOME in your environment, you shouldn't need to do this. Also, you don't have to set HASE_LOG_DIR. But otherwise, HBase writes to /usr/local/hbase/logs, which means you'll need to give the account running hbase write permissions to /usr/local/hbase, which I didn't want to do.

/usr/local/hbase/conf/hbase-site.xml
<configuration>
    <property>
        <name>hbase.rootdir</name>
        <value>hdfs://localhost:9000/hbase</value>
    </property>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
</configuration>

Of course, you'll want to use whatever port you configured hdfs for, which might not be 9000.

"Extra" steps

There were a few things I needed to do before my hbase installation worked correctly.

/etc/hosts

The first one was to edit my "/etc/hosts" and change this line...

127.0.1.1    ubuntu

into this...

127.0.0.1    ubuntu

This seems to be a necessary work around to deal with HBASE-5004 which is still an issue this the time I'm writing this (01/13/2012). The "127.0.1.1    ubuntu" configuration is present for standard ubuntu distributions, but you might not have exactly this anymore. But if you're encountering "Timed out" problems when using the hbase shell, it's probably related to /etc/hosts.

Copy over jars

I needed to copy over certain jars from my hadoop installation to my hbase lib directory. Specifically, I needed to run:

cp ${HADOOP_HOME}/hadoop-core-*.jar   ${HBASE_HOME}/lib/
cp ${HADOOP_HOME}/lib/commons-configuration-*.jar   ${HBASE_HOME}/lib/

Why? Presumably the jars HBase ships with were incompatible with my version of Hadoop. All I know is that without it, I was getting errors like: "HBase is able to connect to ZooKeeper but the connection closes immediately".

Finish

This is what it took for me to get set up. Considering how quickly the Hadoop environment is changing, these steps may likely obsolete (if they're not already). But I hope someone derives some use from this.