Saturday, February 11, 2012

Install Pseudo-distributed HBase with Hadoop on Mac OS X.


Here is a note on my technical journey on installing Pseudo-distributed HBase on Hadoop. I use hadoop 0.20.205 (hadoop-0.20.205.0.tar.gz) and hbase-0.90 (hbase-0.90.5.tar.gz). You can download them from http://hadoop.apache.org/common/releases.html and http://www.apache.org/dyn/closer.cgi/hbase/

Prerequisites :
You need to have Pseudo-distributed Hadoop installed and working first. In my case, I have install Hadoop on /usr/local/hadoop20 and put Hadoop on PATH. 

export HADOOP_HOME=/usr/local/hadoop20
export PATH=$PATH:$HADOOP_HOME/bin

Steps to Install HBase
1. Download hbase-0.90.5.tar.gz and place it at /usr/local and untar it at /usr/local

cd /usr/local

tar xvf hbase-0.90.5.tar.gz 

ln -s hbase-0.90.5 hbase


2. Config network settings: 

Run the command:

lei:lei$ ifconfig
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> mtu 16384
inet6 ::1 prefixlen 128 
inet6 fe80::1%lo0 prefixlen 64 scopeid 0x1 
inet 127.0.0.1 netmask 0xff000000 
gif0: flags=8010<POINTOPOINT,MULTICAST> mtu 1280
stf0: flags=0<> mtu 1280
en0: flags=8863<UP,BROADCAST,SMART,RUNNING,SIMPLEX,MULTICAST> mtu 1500
ether 00:25:4c:e3:b9:7d 
media: autoselect
status: inactive
en1: flags=8863<UP,BROADCAST,SMART,RUNNING,SIMPLEX,MULTICAST> mtu 1500
ether 00:25:00:4d:74:9f 
inet6 fe80::225:ff:fe4c:749e%en1 prefixlen 64 scopeid 0x5 
inet 192.168.5.62 netmask 0xffffff00 broadcast 192.168.5.255
media: autoselect
status: active
fw0: flags=8863<UP,BROADCAST,SMART,RUNNING,SIMPLEX,MULTICAST> mtu 4078
lladdr 00:25:4c:gg:gf:e3:b9:7d
media: autoselect <full-duplex>
status: inactive


Here is content of my /etc/hosts

$ cat /etc/hosts

127.0.0.1 lei.hadoop.local hbase localhost lei
192.168.5.62 home.lei.local


3 Config Hbase

$vi hbase-env.sh

add the following line:

export JAVA_HOME=/Library/Java/Home

$vi hbase-site.xml

add the following content:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>hbase.zookeeper.quorum</name>
<value>home.lei.local</value>
<description></description>
</property>

<property>
<name>hbase.regionserver.dns.nameserver</name>
<value>lei.hadoop.local</value>
<description></description>
</property>
  <property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/tmp/zookeeper</value>
    <description></description>
  </property>

<property>
<name>hbase.rootdir</name>
  <value>hdfs://lei.hadoop.local:9000/hbase2</value>
<description></description>
</property>

<property>
<name>hbase.cluster.distributed</name>
    <value>true</value>
    <description></description>
  </property>

<property>
<name>hbase.master</name>
<value>lei.hadoop.local:60000</value>
<description></description>
</property>

  <property>
    <name>dfs.replication</name>
    <value>1</value>
    <description></description>
</property>
</configuration>


Entry hbase.rootdir needs to match core-site.xml's fs.default.name from Hadoop

Here is my settings: 

lei:conf lei$ cat  ../../hadoop20/conf/core-site.xml 
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
    <property>
        <name>fs.default.name</name>
        <value>hdfs://lei.hadoop.local:9000</value>
        <description></description>
    </property>
</configuration>


lei$ cat regionservers


lei.hadoop.local
home.lei.local


Start and Run Hadoop and HBase:

1. start Hadoop: 

lei:lei$ start-all.sh 

lei:lei$ hadoop fs -ls

lei:lei$ hadoop fs -mkdir /hbase2

lei:lei$ jps
84756 NameNode
84926 SecondaryNameNode
1412 
85095 Jps
84841 DataNode
lei:lei$ 

Note: somehow, you need to copy *.jar under hadoop20/share/hadoop/lib/ to hbase/lib. 

2. start Hbase:

lei:hbase lei$ ./bin/start-hbase.sh


lei:conf lei$ jps
84756 NameNode
84926 SecondaryNameNode
1412 
85439 Jps
85388 HRegionServer
85268 HQuorumPeer
84841 DataNode
85298 HMaster

3. Verify HBase:

lei:hbase lei$ bin/hbase shell
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 0.90.5, r1212209, Fri Dec  9 05:40:36 UTC 2011

hbase(main):001:0> status
1 servers, 0 dead, 2.0000 average load

hbase(main):002:0> create 'test', 'cf'
0 row(s) in 3.9240 seconds

hbase(main):003:0> list 'test'
TABLE                                                                                                                                                                               
test                                                                                                                                                                                
1 row(s) in 0.0820 seconds

hbase(main):004:0> put 'test', 'row1', 'cf:a', 'value1'
0 row(s) in 0.3320 seconds

hbase(main):005:0> put 'test', 'row2', 'cf:b', 'value2'
0 row(s) in 0.0490 seconds

hbase(main):006:0> scan 'test'
ROW                                            COLUMN+CELL                                                                                                                          
 row1                                          column=cf:a, timestamp=1328987613037, value=value1                                                                                   
 row2                                          column=cf:b, timestamp=1328987619457, value=value2                                                                                   
2 row(s) in 0.1020 seconds

hbase(main):007:0> disable 'test'
0 row(s) in 2.0980 seconds

hbase(main):008:0> drop 'test'
0 row(s) in 1.2400 seconds

hbase(main):009:0> exit
lei:hbase lei$ 

4. stop HBase and Hadoop: 

lei:hbase lei$ ./bin/stop-hbase.sh 
stopping hbase.....
home.lei.local: stopping zookeeper.
lei:hbase lei$ 


lei:hbase lei$ stop-all.sh 
no jobtracker to stop
lei.hadoop.local: no tasktracker to stop
stopping namenode
lei.hadoop.local: stopping datanode
lei.hadoop.local: stopping secondarynamenode
lei:hbase lei$ 

You can find useful trouble shooting link here: http://wiki.apache.org/hadoop/Hbase/Troubleshooting

Enjoy. 








3 comments:

  1. Thank you .The guidelines are perfectly apt.
    I ended up fixing a few errors.

    Firstly I was not able to start the HMaster
    It was a permission issue
    I had to chmod 777 -R to my hbase directory

    Secondly there is a need to copy the hadoop core jar and all the other jars in Hadoop to Hbase lib directory
    Many of the forums have emphasized the need for maintaining compatibility between the JAR's

    ReplyDelete
  2. The above setup worked for me at my home. However when I went to university, the region servers were not working. Digging inside, I realized, I also have to set up forward and reverse resolution for the domains specified. How to? - https://help.ubuntu.com/community/BIND9ServerHowto

    ReplyDelete