Wednesday, February 15, 2012

Install Pseudo-distributed HAMA with Hadoop on Mac OS X


Here is a note on installing Pseudo-distributed HAMA on Hadoop. I use hadoop 0.20.2 (hadoop-0.20.2.tar.gz) and hama (hama-0.3.0-incubating.tar.gz). The current stable release of HAMA works with Hadoop 0.20.0. 



Prerequisites :
You need to have Pseudo-distributed Hadoop installed and working first. In my case, I have install Hadoop on /usr/local/hadoop20 and also you need to put Hadoop on PATH. 



Steps to Install HBase
1. Download hama-0.3.0-incubating.tar.gz and place it at /usr/local and untar it at /usr/local





2. Config network settings: 

Run the command to find the network setting:




Here is content of my /etc/hosts



3 Config HAMA


Add the following environment variable. 




add the following line:



add the following settings in hama-site.xml:




Here is the content of gromservers:



Start and Run Hadoop and HAMA:

1. start Hadoop: 





2. start Hbase:





3. Verify HAMA install:







4. stop HAMA and Hadoop: 




You can find useful trouble shooting link here: 
http://wiki.apache.org/hadoop/HowToSetupYourDevelopmentEnvironment

Enjoy. 


Tuesday, February 14, 2012

Build and Run Hadoop Java program with Ant and Maven


Recently I got a chance to experiment with ant/maven to build and run Hadoop Java program. I don't have time to read the manuals, and here is what I did on Ant and Maven scripts to build and run a modified WordCount program. 


I have hadoop-0.20.205.0 installed at /usr/local/hadoop20 and environment variable is set accordingly. The build target is to build jar file LeiBigTop-1.1.jar.


Here is the build.xml:




Assuming you have Hadoop running correctly. You can run the following to build and run the com.lei.bigtop.hadoop.wordcount.CountWordsV2. 


lei:LeiBigTop lei$ ant -version
Apache Ant(TM) version 1.8.2 compiled on December 20 2010


lei:LeiBigTop lei$ ant


lei:LeiBigTop lei$ ant runWordCounts 


Here is the pom.xml.  






lei:LeiBigTop lei$ mvn -version
Apache Maven 3.0.3 (r1075438; 2011-02-28 09:31:09-0800)
Maven home: /usr/local/apache-maven-3.0.3
Java version: 1.6.0_29, vendor: Apple Inc.
Java home: /System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home
Default locale: en_US, platform encoding: MacRoman
OS name: "mac os x", version: "10.6.8", arch: "x86_64", family: "mac"

lei:LeiBigTop lei$ mvn clean compile package verify




Enjoy.  

Monday, February 13, 2012

HBase Java Client to Scan and Display Table Content


I have experimented with HBase, wrote a small client program to scan and display the content of table. It works with hbase-0.90.5 with hadoop-0.20.205.0. I put code here for sharing. 




Enjoy.

Saturday, February 11, 2012

Install Pseudo-distributed HBase with Hadoop on Mac OS X.


Here is a note on my technical journey on installing Pseudo-distributed HBase on Hadoop. I use hadoop 0.20.205 (hadoop-0.20.205.0.tar.gz) and hbase-0.90 (hbase-0.90.5.tar.gz). You can download them from http://hadoop.apache.org/common/releases.html and http://www.apache.org/dyn/closer.cgi/hbase/

Prerequisites :
You need to have Pseudo-distributed Hadoop installed and working first. In my case, I have install Hadoop on /usr/local/hadoop20 and put Hadoop on PATH. 

export HADOOP_HOME=/usr/local/hadoop20
export PATH=$PATH:$HADOOP_HOME/bin

Steps to Install HBase
1. Download hbase-0.90.5.tar.gz and place it at /usr/local and untar it at /usr/local

cd /usr/local

tar xvf hbase-0.90.5.tar.gz 

ln -s hbase-0.90.5 hbase


2. Config network settings: 

Run the command:

lei:lei$ ifconfig
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> mtu 16384
inet6 ::1 prefixlen 128 
inet6 fe80::1%lo0 prefixlen 64 scopeid 0x1 
inet 127.0.0.1 netmask 0xff000000 
gif0: flags=8010<POINTOPOINT,MULTICAST> mtu 1280
stf0: flags=0<> mtu 1280
en0: flags=8863<UP,BROADCAST,SMART,RUNNING,SIMPLEX,MULTICAST> mtu 1500
ether 00:25:4c:e3:b9:7d 
media: autoselect
status: inactive
en1: flags=8863<UP,BROADCAST,SMART,RUNNING,SIMPLEX,MULTICAST> mtu 1500
ether 00:25:00:4d:74:9f 
inet6 fe80::225:ff:fe4c:749e%en1 prefixlen 64 scopeid 0x5 
inet 192.168.5.62 netmask 0xffffff00 broadcast 192.168.5.255
media: autoselect
status: active
fw0: flags=8863<UP,BROADCAST,SMART,RUNNING,SIMPLEX,MULTICAST> mtu 4078
lladdr 00:25:4c:gg:gf:e3:b9:7d
media: autoselect <full-duplex>
status: inactive


Here is content of my /etc/hosts

$ cat /etc/hosts

127.0.0.1 lei.hadoop.local hbase localhost lei
192.168.5.62 home.lei.local


3 Config Hbase

$vi hbase-env.sh

add the following line:

export JAVA_HOME=/Library/Java/Home

$vi hbase-site.xml

add the following content:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>hbase.zookeeper.quorum</name>
<value>home.lei.local</value>
<description></description>
</property>

<property>
<name>hbase.regionserver.dns.nameserver</name>
<value>lei.hadoop.local</value>
<description></description>
</property>
  <property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/tmp/zookeeper</value>
    <description></description>
  </property>

<property>
<name>hbase.rootdir</name>
  <value>hdfs://lei.hadoop.local:9000/hbase2</value>
<description></description>
</property>

<property>
<name>hbase.cluster.distributed</name>
    <value>true</value>
    <description></description>
  </property>

<property>
<name>hbase.master</name>
<value>lei.hadoop.local:60000</value>
<description></description>
</property>

  <property>
    <name>dfs.replication</name>
    <value>1</value>
    <description></description>
</property>
</configuration>


Entry hbase.rootdir needs to match core-site.xml's fs.default.name from Hadoop

Here is my settings: 

lei:conf lei$ cat  ../../hadoop20/conf/core-site.xml 
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
    <property>
        <name>fs.default.name</name>
        <value>hdfs://lei.hadoop.local:9000</value>
        <description></description>
    </property>
</configuration>


lei$ cat regionservers


lei.hadoop.local
home.lei.local


Start and Run Hadoop and HBase:

1. start Hadoop: 

lei:lei$ start-all.sh 

lei:lei$ hadoop fs -ls

lei:lei$ hadoop fs -mkdir /hbase2

lei:lei$ jps
84756 NameNode
84926 SecondaryNameNode
1412 
85095 Jps
84841 DataNode
lei:lei$ 

Note: somehow, you need to copy *.jar under hadoop20/share/hadoop/lib/ to hbase/lib. 

2. start Hbase:

lei:hbase lei$ ./bin/start-hbase.sh


lei:conf lei$ jps
84756 NameNode
84926 SecondaryNameNode
1412 
85439 Jps
85388 HRegionServer
85268 HQuorumPeer
84841 DataNode
85298 HMaster

3. Verify HBase:

lei:hbase lei$ bin/hbase shell
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 0.90.5, r1212209, Fri Dec  9 05:40:36 UTC 2011

hbase(main):001:0> status
1 servers, 0 dead, 2.0000 average load

hbase(main):002:0> create 'test', 'cf'
0 row(s) in 3.9240 seconds

hbase(main):003:0> list 'test'
TABLE                                                                                                                                                                               
test                                                                                                                                                                                
1 row(s) in 0.0820 seconds

hbase(main):004:0> put 'test', 'row1', 'cf:a', 'value1'
0 row(s) in 0.3320 seconds

hbase(main):005:0> put 'test', 'row2', 'cf:b', 'value2'
0 row(s) in 0.0490 seconds

hbase(main):006:0> scan 'test'
ROW                                            COLUMN+CELL                                                                                                                          
 row1                                          column=cf:a, timestamp=1328987613037, value=value1                                                                                   
 row2                                          column=cf:b, timestamp=1328987619457, value=value2                                                                                   
2 row(s) in 0.1020 seconds

hbase(main):007:0> disable 'test'
0 row(s) in 2.0980 seconds

hbase(main):008:0> drop 'test'
0 row(s) in 1.2400 seconds

hbase(main):009:0> exit
lei:hbase lei$ 

4. stop HBase and Hadoop: 

lei:hbase lei$ ./bin/stop-hbase.sh 
stopping hbase.....
home.lei.local: stopping zookeeper.
lei:hbase lei$ 


lei:hbase lei$ stop-all.sh 
no jobtracker to stop
lei.hadoop.local: no tasktracker to stop
stopping namenode
lei.hadoop.local: stopping datanode
lei.hadoop.local: stopping secondarynamenode
lei:hbase lei$ 

You can find useful trouble shooting link here: http://wiki.apache.org/hadoop/Hbase/Troubleshooting

Enjoy.