Thursday, March 29, 2012

Improvement over BigTop's TestHadoopExamples.groovy with YAML and TestRunHadoopExamples.groovy

Thanks for feedbacks from Wing Yew and Roman, I was set out to improve

./bigtop-tests/test-artifacts/hadoop/src/main/groovy/org/apache/bigtop/itest/hadoopexamples/TestHadoopExamples.groovy

using YAML, Groovy and existing BigTop's itest infrastructure. YAML is better than XML to capture the list of Shell commands.

I have made the improvement on the following two areas:

1. Move the commands inside TestHadoopExamples.groovy to YAML file, so that we need NOT compile the code if add/change the test cases.

2. Introduce the comparator to compare output with the expected to verify the commands' execution. For example, when calculating Pi, make sure the end result matched with the known number.

The code has been tested on ubuntu 10.04 at AWS, and jira submitted.


Here are the test cases in TestHadoopExamples.groovy:
 
  static Map examples =
    [
        pi                :'20 10',
        wordcount         :"$EXAMPLES/text $EXAMPLES_OUT/wordcount",
        multifilewc       :"$EXAMPLES/text $EXAMPLES_OUT/multifilewc",
//        aggregatewordcount:"$EXAMPLES/text $EXAMPLES_OUT/aggregatewordcount 5 textinputformat",
//        aggregatewordhist :"$EXAMPLES/text $EXAMPLES_OUT/aggregatewordhist 5 textinputformat",
        grep              :"$EXAMPLES/text $EXAMPLES_OUT/grep '[Cc]uriouser'",
        sleep             :"-m 10 -r 10",
        secondarysort     :"$EXAMPLES/ints $EXAMPLES_OUT/secondarysort",
        randomtextwriter  :"-Dtest.randomtextwrite.total_bytes=1073741824 $EXAMPLES_OUT/randomtextwriter"
    ];


Here is the YAML content that has all test cases covered.
 

- !!org.apache.bigtop.itest.hadoopexamples.BigTopIntegrationTest
  integrationTestCommandList:
  - !!org.apache.bigtop.itest.hadoopexamples.BigTopTestCommand {command: hadoop jar $HADOOP_HOME/hadoop-examples-*.jar pi 5 5,
    commandComparator: echo "Pi is 3.68", comparatorClass: org.apache.hadoop.cli.util.SubstringComparator}
  postTestCommandList: []
  preTestCommandList: []
  testDesc: calculate pi using hadoop MR
  testName: calculate pi
- !!org.apache.bigtop.itest.hadoopexamples.BigTopIntegrationTest
  integrationTestCommandList:
  - !!org.apache.bigtop.itest.hadoopexamples.BigTopTestCommand {command: hadoop jar $HADOOP_HOME/hadoop-examples-*.jar wordcount /wordcount /wordcount_out,
    commandComparator: null, comparatorClass: null}
  - !!org.apache.bigtop.itest.hadoopexamples.BigTopTestCommand {command: mkdir ./wordcount_out,
    commandComparator: null, comparatorClass: null}
  - !!org.apache.bigtop.itest.hadoopexamples.BigTopTestCommand {command: hadoop fs -get /wordcount_out/* ./wordcount_out,
    commandComparator: null, comparatorClass: null}
  - !!org.apache.bigtop.itest.hadoopexamples.BigTopTestCommand {command: hadoop fs -rmr /wordcount,
    commandComparator: null, comparatorClass: null}
  - !!org.apache.bigtop.itest.hadoopexamples.BigTopTestCommand {command: hadoop fs -rmr /wordcount_out/,
    commandComparator: null, comparatorClass: null}
  postTestCommandList:
  - !!org.apache.bigtop.itest.hadoopexamples.BigTopTestCommand {command: 'cat ./wordcount_out/*
      | grep  Roman | sed ''s/[^0-9.]*\([0-9.]*\).*/\1/''', commandComparator: cat wordcount/* | grep -c Roman,
    comparatorClass: org.apache.bigtop.itest.hadoopexamples.ExtactComparatorIgnoreWhiteSpace}
  preTestCommandList:
  - !!org.apache.bigtop.itest.hadoopexamples.BigTopTestCommand {command: rm -rf ./wordcount,
    commandComparator: null, comparatorClass: null}
  - !!org.apache.bigtop.itest.hadoopexamples.BigTopTestCommand {command: rm -rf ./wordcount_out,
    commandComparator: null, comparatorClass: null}
  - !!org.apache.bigtop.itest.hadoopexamples.BigTopTestCommand {command: mkdir ./wordcount,
    commandComparator: null, comparatorClass: null}
  - !!org.apache.bigtop.itest.hadoopexamples.BigTopTestCommand {command: 'curl http://www.meetup.com/HandsOnProgrammingEvents/events/53837022/
      | sed -e :a -e ''s/<[^>]*>//g;/</N;//ba'' | sed ''s/&nbsp//g'' | sed ''s/^[
      \t]*//;s/[ \t]*$//''  | sed ''/^$/d'' | sed ''/"http[^"]*"/d'' > ./wordcount/content',
    commandComparator: null, comparatorClass: null}
  - !!org.apache.bigtop.itest.hadoopexamples.BigTopTestCommand {command: hadoop fs -mkdir /wordcount,
    commandComparator: null, comparatorClass: null}
  - !!org.apache.bigtop.itest.hadoopexamples.BigTopTestCommand {command: hadoop fs -put ./wordcount/* /wordcount,
    commandComparator: null, comparatorClass: null}
  testDesc: count word in Hadoop MR
  testName: count word in MR
- !!org.apache.bigtop.itest.hadoopexamples.BigTopIntegrationTest
  integrationTestCommandList:
  - !!org.apache.bigtop.itest.hadoopexamples.BigTopTestCommand {command: hadoop fs -rmr examples-output/wordcount,
    commandComparator: null, comparatorClass: null}
  - !!org.apache.bigtop.itest.hadoopexamples.BigTopTestCommand {command: hadoop jar $HADOOP_HOME/hadoop-examples-*.jar wordcount examples/text examples-output/wordcount,
    commandComparator: null, comparatorClass: null}
  postTestCommandList:
  - !!org.apache.bigtop.itest.hadoopexamples.BigTopTestCommand {command: 'hadoop
      fs -cat  examples-output/wordcount/part* | grep "Commission" | sed ''s/[^0-9]*\([0-9]\+\).*/\1/''
      | tr ''\n'' '' '' | sed "s/\(^.*$\)/\1\n/" | sed ''s/^[[:space:]]*//;s/[[:space:]]*$//''
      | sed -e ''s/ /+/g'' | bc', commandComparator: hadoop fs -cat  examples/text/* | grep -c "Commission",
    comparatorClass: org.apache.bigtop.itest.hadoopexamples.ExtactComparatorIgnoreWhiteSpace}
  preTestCommandList: []
  testDesc: countword in TestHadoopExamples.groovy
  testName: countword in MR
- !!org.apache.bigtop.itest.hadoopexamples.BigTopIntegrationTest
  integrationTestCommandList:
  - !!org.apache.bigtop.itest.hadoopexamples.BigTopTestCommand {command: hadoop fs -rmr examples-output/multifilewc,
    commandComparator: null, comparatorClass: null}
  - !!org.apache.bigtop.itest.hadoopexamples.BigTopTestCommand {command: hadoop jar $HADOOP_HOME/hadoop-examples-*.jar multifilewc examples/text examples-output/multifilewc,
    commandComparator: null, comparatorClass: null}
  postTestCommandList:
  - !!org.apache.bigtop.itest.hadoopexamples.BigTopTestCommand {command: 'hadoop
      fs -cat  examples-output/multifilewc/part* | grep "Ambassadors" | sed ''s/[^0-9]*\([0-9]\+\).*/\1/''
      | tr ''\n'' '' '' | sed "s/\(^.*$\)/\1\n/" | sed ''s/^[[:space:]]*//;s/[[:space:]]*$//''
      | sed -e ''s/ /+/g'' | bc', commandComparator: hadoop fs -cat  examples/text/* | grep -c "Ambassadors",
    comparatorClass: org.apache.bigtop.itest.hadoopexamples.ExtactComparatorIgnoreWhiteSpace}
  preTestCommandList: []
  testDesc: multifilewcin TestHadoopExamples.groovy
  testName: multifilewc test in MR
- !!org.apache.bigtop.itest.hadoopexamples.BigTopIntegrationTest
  integrationTestCommandList:
  - !!org.apache.bigtop.itest.hadoopexamples.BigTopTestCommand {command: hadoop fs -rmr examples-output/grep,
    commandComparator: null, comparatorClass: null}
  - !!org.apache.bigtop.itest.hadoopexamples.BigTopTestCommand {command: 'hadoop
      jar $HADOOP_HOME/hadoop-examples-*.jar grep examples/text examples-output/grep   ''[Cc]uriouser''',
    commandComparator: null, comparatorClass: null}
  postTestCommandList:
  - !!org.apache.bigtop.itest.hadoopexamples.BigTopTestCommand {command: 'hadoop
      fs -cat examples-output/grep/part* | sed ''s/[0-9]*//g'' | sed ''s/Curiouser/curiouser/g''',
    commandComparator: echo "curiousercuriouser", comparatorClass: org.apache.bigtop.itest.hadoopexamples.ExtactComparatorIgnoreWhiteSpace}
  preTestCommandList: []
  testDesc: grep in TestHadoopExamples.groovy
  testName: grep in MR
- !!org.apache.bigtop.itest.hadoopexamples.BigTopIntegrationTest
  integrationTestCommandList:
  - !!org.apache.bigtop.itest.hadoopexamples.BigTopTestCommand {command: hadoop jar $HADOOP_HOME/hadoop-examples-*.jar sleep -m 10 -r 10,
    commandComparator: null, comparatorClass: null}
  postTestCommandList: []
  preTestCommandList: []
  testDesc: sleep in TestHadoopExamples.groovy
  testName: sleep in MR





Here is the content of TestRunHadoopExamples.groovy:
 


/**
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements.  See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership.  The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License.  You may obtain a copy of the License at
* <p/>
* http://www.apache.org/licenses/LICENSE-2.0
* <p/>
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package org.apache.bigtop.itest.hadoopexamples

import java.util.Map;

import org.junit.Ignore
import org.junit.Test
import org.junit.runner.RunWith;
import org.junit.BeforeClass
import org.junit.runners.Parameterized.Parameters;

import static org.junit.Assert.assertTrue
import static org.junit.Assert.assertNotNull


import org.apache.bigtop.itest.junit.OrderedParameterized;
import org.apache.bigtop.itest.shell.Shell;


import org.apache.commons.logging.LogFactory
import org.apache.commons.logging.Log

import org.apache.hadoop.cli.util.ComparatorBase

import org.apache.bigtop.itest.junit.OrderedParameterized
import org.junit.runners.Parameterized.Parameters
import org.junit.runner.RunWith

import org.yaml.snakeyaml.Yaml

import org.apache.hadoop.conf.Configuration
import org.apache.bigtop.itest.JarContent



@RunWith(OrderedParameterized.class)
class TestRunHadoopExamples {
 
 static private Log LOG = LogFactory.getLog(TestRunHadoopExamples.class);
 static private String TEST_CASE_FILE_NAME = "./bigtop-testcases.yaml";
 //static private String TEST_CASE_FILE_NAME = "/home/ubuntu/bigtop/bigtop-tests/test-execution/smokes/hadoop/bigtop-testcases.yaml";
 static private String TEST_CASE_NAME_PREFIX = 'TestRunHadoopExamples';
 
 static private Shell sh = new Shell("/bin/bash -s");

 private static final String HADOOP_HOME = System.getenv('HADOOP_HOME');
 private static final String HADOOP_CONF_DIR = System.getenv('HADOOP_CONF_DIR');
 private static String hadoopExamplesJar = JarContent.getJarName(HADOOP_HOME, 'hadoop.*examples.*.jar');
 static {
  assertNotNull("HADOOP_HOME has to be set to run this test", HADOOP_HOME);
  assertNotNull("HADOOP_CONF_DIR has to be set to run this test", HADOOP_CONF_DIR);
  assertNotNull("Can't find hadoop-examples.jar file", hadoopExamplesJar);
 }
 private static Configuration conf;
 private static String HADOOP_OPTIONS;
 private static final String EXAMPLES = "examples";
 private static final String EXAMPLES_OUT = "examples-output";
  
 
 private String testName;
 private String testCaseString;
 

 private String stripOutLeadingBracket (String str) {
  if (str==null) return str;
  if (str.length()<2) return str;
  if (str.startsWith("[") && str.endsWith("]")) {
   return str.substring(1,str.length()-1)
  } else {
   return str;
  }
 }

 private static List<BigTopIntegrationTest> loadBigTopIntegrationTestCases (String fileName) {
  String fileContents = new File(fileName).text
  List<BigTopIntegrationTest> testCaseList2 = new Yaml().load(fileContents)
 }
 
 @BeforeClass
 static void setUp() {
  String skipSetup = System.getProperty('bigtop.itest.skip.setup');
  if (skipSetup!=null && skipSetup.length()>0)
   return;
  
  LOG.info("Start setUp") 
  conf = new Configuration();
  conf.addResource('mapred-site.xml');
  HADOOP_OPTIONS = "-fs ${conf.get('fs.default.name')} -jt ${conf.get('mapred.job.tracker')}";
  // Unpack resource
  JarContent.unpackJarContainer(TestRunHadoopExamples.class, '.' , null)
  
  sh.exec("hadoop fs $HADOOP_OPTIONS -test -e $EXAMPLES");
  if (sh.getRet() == 0) {
   sh.exec("hadoop fs $HADOOP_OPTIONS -rmr -skipTrash $EXAMPLES");
   assertTrue("Deletion of previous $EXAMPLES from HDFS failed", sh.getRet() == 0);
  }
  sh.exec("hadoop fs $HADOOP_OPTIONS -test -e $EXAMPLES_OUT");
  if (sh.getRet() == 0) {
   sh.exec("hadoop fs $HADOOP_OPTIONS -rmr -skipTrash $EXAMPLES_OUT");
   assertTrue("Deletion of previous examples output from HDFS failed", sh.getRet() == 0);
  }
  
  // copy test files to HDFS
  sh.exec("hadoop fs $HADOOP_OPTIONS -put $EXAMPLES $EXAMPLES",
    "hadoop fs $HADOOP_OPTIONS -mkdir $EXAMPLES_OUT");
   assertTrue("Could not create output directory", sh.getRet() == 0);
 }
  
 
 @Parameters
 public static Map<String, Object[]> generateTests() {
  Map<String, Object[]> res = [:];
  List<BigTopIntegrationTest> testList = loadBigTopIntegrationTestCases (TEST_CASE_FILE_NAME);
  int count=1;
  
  for (BigTopIntegrationTest test : testList) {
   def nowCal = Calendar.instance
   String casename = "$TEST_CASE_NAME_PREFIX-$nowCal.time-$count"
   Object[] args = [ casename, new Yaml().dump(test) ]
   res.put( casename, args)
   count++;
  }
  return res;
 }
  
 public TestRunHadoopExamples (String name, String testDetail ) {
  testName = name;
  testCaseString = testDetail;
  displayMessage (["Test case name - $testName, args - $testCaseString"], false)
 }
 
 private void displayMessage (def message, boolean error) {
  if (message!=null) {
   if (error) 
    message.each() { LOG.error "${it}" };
   else 
    message.each() { LOG.info "${it}" };
  }
 }

 public boolean runExample(BigTopIntegrationTest test) {
  boolean success = true
  
  for ( BigTopTestCommand command: test.getCommandList() ) {
   displayMessage (["Shell command ["  + command.getCommand() + "]"], false);
   sh.exec(command.getCommand());
   String stdout = sh.getOut();
   String shReturnCode = sh.getRet()
   String shStdErr = sh.getErr();
   
   if ( command.getComparatorClass() !=null && command.getCommandComparator()!=null && command.getCommandComparator().trim().length()>0) {
    ["ComparatorClass - " + command.getComparatorClass(), "CommandComparator - " + command.getCommandComparator(), "Shell CommandComparator ["  + command.getCommandComparator() + "]"].each() { LOG.info "${it}" };
    sh.exec(command.getCommandComparator());
    String expectedOutput = sh.getOut();
    displayMessage (["CommandComparator return code is $shReturnCode, Output is $expectedOutput"], false);

    String comparatorClassName = command.getComparatorClass();
    ComparatorBase compare = BigTopIntegrationTestFacade.getComparatorClass(comparatorClassName);
    def resultDisplay = []
    if (compare==null) {
     resultDisplay.add("Error! No such ComparatorClass - $comparatorClassName");
     success = false;
    } else {
     if (stdout.length()>=2 && expectedOutput.length()>=2 ) {
      boolean ret = compare.compare( stripOutLeadingBracket (stdout) , stripOutLeadingBracket(expectedOutput) );
      resultDisplay = (ret) ? ["SUCCESS! actual output - $stdout, expected - $expectedOutput, compare class - $comparatorClassName" ] : ["FAIL! actual output - $stdout,  expected - $expectedOutput, compare class - $comparatorClassName"] 
      if (!ret) success = false
     } else {
      resultDisplay.add("Error! No output to compare. ");
      success = false;
     }
    }
    displayMessage (resultDisplay, success);

   } else {
    def resultDisplay = (sh.getRet()==0) ? ["Command return code - $shReturnCode, Output - $stdout" ] : ["Command return code - $shReturnCode,  Output - $stdout, Error output is $shStdErr" ]
    displayMessage (resultDisplay, false)
   }
  }

  return success
 }
  
 @Test
 public void testHadoopMapReduceExample() {
  LOG.info( "testHadoopMapReduceExample() - " + testName);
  
  BigTopIntegrationTest test = new Yaml().load(testCaseString)

  LOG.info("Test case name [" + test.getTestName() + "]");
  LOG.info("Test case description [" + test.getTestDesc() + "]");
  LOG.info("Test case details - " + test.toString());

  assertTrue("Test succeed : ", runExample(test));
 }
 
 public static void main(String[] args) {
  
 }
 
}

Enjoy the journey!

Sunday, March 18, 2012

Build Apache BigTop 0.3.0 at AWS

Well, after spending about 8 hrs, I finally got BigTop 0.3.0 to build on AWS.  Less painful than my BigTop 2.0 build experience. Most of time, is just wait, watch the screen scrolling. To make the experience more enjoyable, I put together a Youtube music playlist and let it roll while waiting.  

I put down some notes here, to remind myself or anyone who cares to travel through the same path.  

1. Select a right AMI (ami-31bc7758) and choose a right AWS configuration (large) are worth the time saved (credits go to Doug and Ron)

2. Need to install JDK 6.x and JDK 5.x.

3. Need to install Maven 3.x

4. Need to install apache-forrest-0.8

5. Need to follow Bikramjit's note.

6. Need to set  MAVEN_OPTS="-Xms1024m -Xmx2048m"

7. Here is my environment variables for BigTop build:



After the build, I did the test on the Hadoop install.

Here are my conf settings:

mapred-site.xml:


hdfs-site.xml:



core-site.xml:


To perform BigTop smoke test, I have done the following steps:

$cp ./build/hadoop/deb/hadoop-1.0.
1/src/test/org/apache/hadoop/cli/testConf.xml /home/ubuntu/bigtop/bigtop-tests/test-execution/smokes/hadoop/target/clitest_data/testConf.xml
$cd ~/bigtop/bigtop-tests/test-artifacts
$mvn install

$cd ~/bigtop/bigtop-tests/test-execution/common
$mvn install


$cd ~/bigtop/bigtop-tests/test-execution/conf
$mvn install

$cd ~/bigtop/bigtop-tests/test-execution/smokes/hadoop

add the followings to pom.xml

<dependency>
<groupId>org.apache.cxf</groupId>
<artifactId>cxf-rt-frontend-jaxrs</artifactId>
<version>2.5.0</version>
</dependency>

$mvn -Dhadoop.log.dir=target/build/test/logs verify > lei4.output

Note: I run mvn verify, it failed pretty bad, then I dig into the code and see what are needed to set. That was the result. 

Here is the content of the output file:


You can see that for cli.TestCLI:
# Tests pass: 152 (98%)
# Tests fail: 2 (1%)

I can not figure out why those 2 test cases failed.

org.apache.bigtop.itest.hadoopexamples.TestHadoopExamples passed:
Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 491.765 sec

org.apache.bigtop.itest.hadoopsmoke.TestHadoopSmoke passed:
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 89.924 sec



Enjoy!

Tuesday, March 13, 2012

Simple MapReduce program using Hadoop and HBase

I have a set of input data, which is key value pairs that are generated by another Java program. The data look like the following: 

...
j 9.0
e 4.0
i 8.0
b 9.0
g 25.0
e 88.0
f 32.0
...

And, I wrote a very simple MapReduce program using HBase to import the data into the table and calculate the summation for each key.  I am using Hadoop 0.20.205.0 and HBase 0.90.5.

 

 
public class HBaseCalSum {

 static private String TABLE_NAME = "cal_sum";
 static private String COLUMN_NAME = "content";
 static private String COLUMN_KEY_NAME = "sum";
 
    public static class HBaseCalSumMap extends Mapper<LongWritable,Text,Text, DoubleWritable>
    {
        
        @Override
        public void map(LongWritable key,Text value,Context context) throws IOException, InterruptedException
        {
            String line = value.toString();

            if ( JavaUtil.isStringKeyNumSeparatedBySpace(line) )  {
             StringTokenizer st = new StringTokenizer(line);
             String k = (String) st.nextElement();
             Double d = Double.valueOf( (String)st.nextElement());
             context.write(new Text(k), new DoubleWritable(d));
            }

    
        }
    }

    public static class CalSumReduce extends TableReducer<Text, DoubleWritable, NullWritable>
    {
        @Override
        public void reduce(Text key, Iterable<DoubleWritable> values, Context context) throws IOException, InterruptedException
        {
            int sum = 0;
            for(DoubleWritable v : values)
            {
                sum += v.get();
            }
           
            Put put = new Put(Bytes.toBytes(key.toString()));
            put.add(Bytes.toBytes(COLUMN_NAME), Bytes.toBytes(COLUMN_KEY_NAME), Bytes.toBytes(String.valueOf(sum)));
            context.write(NullWritable.get(), put);
        }
    }

    public static void main(String args[]) throws Exception
    {
     if (args.length<1) {
      System.err.println("Usage: HBaseCalSum <path> ");
      System.exit(-1);
     }
     

        
        Configuration conf = new Configuration();
        conf.set(TableOutputFormat.OUTPUT_TABLE, TABLE_NAME);
        HbaseUtil.createHBaseTable(TABLE_NAME, COLUMN_NAME);

        String input = args[0];
        Job job = new Job(conf, "HBaseCalSum table with " + input);
       
        job.setJarByClass(HBaseCalSum.class);
        job.setMapperClass(HBaseCalSumMap.class);
        job.setReducerClass(CalSumReduce.class);
       
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(DoubleWritable.class);
       
        job.setInputFormatClass(TextInputFormat.class);
        job.setOutputFormatClass(TableOutputFormat.class);

        FileInputFormat.addInputPath(job, new Path(input));

        int retJob = job.waitForCompletion(true)?0:1;
        
        System.exit(retJob);

    }
}


 
$ ./bin/hbase shell
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 0.90.5, r1212209, Fri Dec  9 05:40:36 UTC 2011

hbase(main):001:0> scan 'cal_sum'
ROW                                            COLUMN+CELL                                                                                                                          
 a                                             column=content:sum, timestamp=1331703862041, value=1304                                                                              
 b                                             column=content:sum, timestamp=1331703862041, value=1362                                                                              
 c                                             column=content:sum, timestamp=1331703862041, value=864                                                                               
 d                                             column=content:sum, timestamp=1331703862041, value=1602                                                                              
 e                                             column=content:sum, timestamp=1331703862041, value=1710                                                                              
 f                                             column=content:sum, timestamp=1331703862041, value=491                                                                               
 g                                             column=content:sum, timestamp=1331703862041, value=920                                                                               
 h                                             column=content:sum, timestamp=1331703862041, value=1353                                                                              
 i                                             column=content:sum, timestamp=1331703862041, value=1274                                                                              
 j                                             column=content:sum, timestamp=1331703862041, value=1215                                                                              
 k                                             column=content:sum, timestamp=1331703862041, value=869                                                                               
 l                                             column=content:sum, timestamp=1331703862041, value=1461                                                                              
 m                                             column=content:sum, timestamp=1331703862041, value=1108                                                                              
13 row(s) in 1.2870 seconds

hbase(main):002:0> 


Here is the Java code for HTable scan:
 

 public static void scanTable(String strConf, String tableName, String colName, String keyName) throws Exception {
  Configuration conf; 
  conf = HBaseConfiguration.create();
  conf.addResource(new Path(strConf)); // "/usr/local/hbase/conf/hbase-site.xml"
  
  Scan s = new Scan();
        HTable table = new HTable(conf, tableName);

  s.addColumn(Bytes.toBytes(colName), Bytes.toBytes(keyName));
  ResultScanner scanner = table.getScanner(s);
  try {
    // Scanners return Result instances.
    // Now, for the actual iteration. One way is to use a while loop like so:
    for (Result rr = scanner.next(); rr != null; rr = scanner.next()) {
      // print out the row we found and the columns we were looking for
     System.out.println("Found row: " + rr);
     for (KeyValue kv : rr.raw()) {
      
      System.out.println("Found key: " + Bytes.toStringBinary ( kv.getKey(), 2, kv.getRowLength() ) + " Count: " + Bytes.toStringBinary( kv.getValue() ) );
     }
    }

  } finally {
    // Make sure you close your scanners when you are done!
    // Thats why we have it inside a try/finally clause
    scanner.close();
  }

 }



Run and test this Simple MapReduce program with HBase

I installed both Hadoop and HBase in pseodo-distributed (single server) mode. Please refer to my other blogger how to install HBase. Or better yet, use BigTop.

Also, I modifed hadoop-env.sh to add HBase and Zookeeper jars to the HADOOP_CLASSPATH:

 
export HADOOP_CLASSPATH=$HBASE_HOME/hbase-0.90.5.jar:$HBASE_HOME/hbase-0.90.5-tests.jar:$HBASE_HOME/conf:${HBASE_HOME}/lib/zookeeper-3.3.2.jar

1) Start Hadoop in pseodo-distributed (single server) mode
2) Start Hbase in psuedo-distributed (single server) mode.

I run jps to verify all Hadoop and HBase processes are running:



 
home:lei$ jps

9968 HQuorumPeer
9999 HMaster
9632 DataNode
10354 Jps
10082 HRegionServer
9543 NameNode
9720 SecondaryNameNode


3) Run the following commands:
 
home:lei$hadoop fs -rmr ./data
home:lei$hadoop fs -mkdir ./data
home:lei$hadoop fs -put ./data/* ./data
home:lei$hadoop jar ./target/LeiBigTop-1.1.jar  com.lei.bigtop.hbase.calsum.HBaseCalSum ./data


I tested the code on my Mac 10.6.8, Ubuntu 11.10 and AWS EC2 (Ubuntu 11.x).


Here is the ant target to run the program:
 

 <target name="hbaseCalCum" description="Simple MapReduce program with HBase">
  <property name="myclasspath" refid="run.class.path"/>
  <echo message="Simple MapReduce program with HBase"/>
  <echo message="Classpath = ${myclasspath}"/>
  <echo>+--------------------------------------------------------+</echo>
  <echo>+ Program: com.lei.bigtop.hbase.calsum.HBaseCalSum       +</echo>
  <echo>+ Input: ./data                                          +</echo>
  <echo>+--------------------------------------------------------+</echo>
  <echo>${user.dir}</echo>
  <exec executable="hadoop" dir=".">
   <arg value="fs"/>
   <arg value="-rmr"/>
   <arg value="${user.dir}/data"/>
  </exec>
  <exec executable="hadoop" dir=".">
   <arg value="fs"/>
   <arg value="-mkdir"/>
   <arg value="${user.dir}/data"/>
  </exec>
  <exec executable="hadoop" dir=".">
   <arg value="fs"/>
   <arg value="-put"/>
   <arg value="${user.dir}/data/input.txt"/>
   <arg value="${user.dir}/data"/>
  </exec>
  <exec executable="hadoop" dir=".">
   <arg value="jar"/>
   <arg value="${basedir}/${dest.dir.release}/${target.jar}"/>
   <arg value="com.lei.bigtop.hbase.calsum.HBaseCalSum"/>
   <arg value="${user.dir}/data"/>
  </exec>
  <exec executable="hadoop" dir=".">
   <arg value="jar"/>
   <arg value="${basedir}/${dest.dir.release}/${target.jar}"/>
   <arg value="com.lei.bigtop.hbase.util.HBaseScanTable"/>
  </exec>
 </target>


Well, the journey's next stop will be "Test HBase/Hadoop Installation with a simple MapReduce program with HBase"

Tuesday, March 6, 2012

An Implement of Integration Test for BigTop with new XML schema

Here are the implementation details for the possible improvement of integration test for BigTop.

1.  XML parser of test suite is in BigTopTestSuiteXML.groovy:

 
 public static List<BigTopIntegrationTestInterface> readTestSuiteFromXMLv2(String fileName) {
  
  List<BigTopIntegrationTestInterface> testCaseList = new ArrayList<BigTopIntegrationTestInterface>();
  DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
  try {
   Document document = dbf.newDocumentBuilder().parse(new File(fileName));
    
   Node previousNode = null;
   BigTopIntegrationTestInterface currentTestCase = null;
      
   NodeList nodeList = document.getElementsByTagName("command");
   for (int i = 0; i < nodeList.getLength(); i++) {
    Node currentnode = nodeList.item(i);
    if (  ! (currentnode.getNodeType() == Node.ELEMENT_NODE) ) {
     break;
    }
    if (currentnode.getParentNode()==null || 
     currentnode.getParentNode().getParentNode()==null || 
     currentnode.getParentNode().getParentNode().getParentNode()==null) {
     break;
    }
    
    Node grandparentNode = currentnode.getParentNode().getParentNode();
    
    if (grandparentNode.getParentNode()!=previousNode) {
     if (currentTestCase!=null) testCaseList.add(currentTestCase);
     previousNode = grandparentNode.getParentNode();
     currentTestCase = BigTopIntegrationTestFactory.getInstance().createTestCase();
    }

    if (currentTestCase!=null && grandparentNode!=null) {
     BigTopTestCommandInterface command = BigTopIntegrationTestFacade.getInstance().setTestCaseSuiteDetail (currentTestCase, 
      grandparentNode.getNodeName()+":"+currentnode.getNodeName(), currentnode.getTextContent());
     if ( currentnode.getNextSibling() !=null && command!=null) {
      Node sib = currentnode.getNextSibling();
      while ( sib!=null) {
       if (  ( sib.getNodeType() == Node.ELEMENT_NODE ) ) {
        BigTopIntegrationTestFacade.getInstance().setTestCommandDetail(command, sib.getNodeName(), sib.getTextContent());
       }
       sib = sib.getNextSibling();
      }
     }
    }

    NodeList childNodeList =  grandparentNode.getParentNode().getChildNodes();
  
    for (int j = 0; j < childNodeList.getLength(); j++) {
     Node node2 = childNodeList.item(j);
     if (  ( node2.getNodeType() == Node.ELEMENT_NODE ) ) {
      BigTopIntegrationTestFacade.getInstance().setTestCaseSuiteDetail(currentTestCase, node2.getNodeName(), node2.getTextContent());
     }
    }
   }
   if (currentTestCase!=null) testCaseList.add(currentTestCase);
   
   } catch(ParserConfigurationException pce) {
    pce.printStackTrace();
   } catch(SAXException se) {
    se.printStackTrace();
   } catch(IOException ioe) {
    ioe.printStackTrace();
   }
     
   return testCaseList;
  }



2. BigTopIntegrationTestFacade.java is to set value for TestCase and TestCommand.

 
 static private BigTopIntegrationTestFacade self=null;
 static private Map<String, ComparatorBase> mapCompareClassMap = null;
    final static private Map<String, SetTestCommandDetail>  commandNameToValueMapping = new HashMap<String, SetTestCommandDetail>() { 
  private static final long serialVersionUID = -1L;
  {
      put("command-comparator-type",  new SetTestComparatorClass() );
      put("command-comparator-compare-to",  new SetTestCommandComparator() );
     }
    };

    private static class SetTestComparatorClass implements SetTestCommandDetail {
     public void setTestCommandDetail(BigTopTestCommandInterface command, String value) {
      command.setComparatorClass(value);
     }
    }

    private static class SetTestCommandComparator implements SetTestCommandDetail {
     public void setTestCommandDetail(BigTopTestCommandInterface command, String value) {
      command.setCommandComparator(value);
     }
    }

 public static BigTopIntegrationTestFacade getInstance() {
  if (self == null) {
   synchronized (BigTopIntegrationTestFacade.class) {
    if (self == null) 
     self = new BigTopIntegrationTestFacade();
     mapCompareClassMap = new HashMap<String, ComparatorBase>();
   }
  }
  return self;
 }

    public void setTestCommandDetail(BigTopTestCommandInterface command, String name, String value) {
     if (command==null) return;
     SetTestCommandDetail setHandler = commandNameToValueMapping.get(name);
     if (setHandler==null) return;
     setHandler.setTestCommandDetail(command, value);
    }

    
 private BigTopIntegrationTestFacade () {}

 public static BigTopIntegrationTestFacade getInstance() {
  if (self == null) {
   synchronized (BigTopIntegrationTestFacade.class) {
    if (self == null) 
     self = new BigTopIntegrationTestFacade();
     mapCompareClassMap = new HashMap<String, ComparatorBase>();
   }
  }
  return self;
 }




3. RunHadoopTestFromXMLFile.groovy is to run test commands in the ./bigtop-testcases.xml.
 
 static Shell sh = new Shell("/bin/bash -s");
 
 public static void runTestInXMLFile(String fielName) {
  List<BigTopIntegrationTestInterface> testCaseList = BigTopTestSuiteXML.readTestSuiteFromXML(fielName);
  System.out.println("Run test suite in XML file [" + fielName + "]");
  for (BigTopIntegrationTestInterface t: testCaseList) {
   //System.out.println( t );
   println("Run test case name [" + t.getTestName() + "]");
   println("Run test case description [" + t.getTestName() + "]");
   
   for ( BigTopTestCommandInterface command: t.getCommandList() ) {
    println "Command line ["  + command.getCommand() + "]"
    
    sh.exec(command.getCommand());
    String stdout = sh.getOut();
    
    if ( command.getComparatorClass() !=null && command.getCommandComparator()!=null ) {
     System.out.println( "ComparatorClass - " + command.getComparatorClass() );
     System.out.println( "CommandComparator - " + command.getCommandComparator() );
     println "CommandComparator line ["  + command.getCommandComparator() + "]"
     sh.exec(command.getCommandComparator());
     String stdout2 = sh.getOut();
     System.out.println( "CommandComparator return code is " + sh.getRet() + " Output is " + stdout2 );
     ComparatorBase compare = BigTopIntegrationTestFacade.getInstance().getComparatorClass(command.getComparatorClass());
     if (compare==null) {
      System.out.println( "No such ComparatorClass - " + command.getComparatorClass() );
     } else {
      if (stdout.length()>=2 && stdout2.length()>=2 ) {
       boolean ret = compare.compare( stripOutLeadingBracket (stdout) , stripOutLeadingBracket(stdout2) );
       if (ret) {
        System.out.println( "  SUCCESS! \n  actual output - " + stdout + " \n  expected -" + stdout2 + "  \n  compare class - "+ command.getComparatorClass());
       } else {
        System.out.println( "  FAIL! \n  actual output - " + stdout + " \n  expected -" + stdout2 + "  \n  compare class - "+ command.getComparatorClass());
       }
      }
     }
    } else {
    
     if (sh.getRet()==0)
      System.out.println( "SUCCESS! return code is " + sh.getRet() + " Output is " + stdout);
     else 
      System.out.println( "FAIL! return code is " + sh.getRet() + " Output is " + stdout);
     

    }
    
   }
  }
 }

 public static void main(String[] args) {
  if (args==null || args.length==0 ) {
   runTestInXMLFile("./bigtop-testcases.xml");
  } else {
   for (String filename : args) {
    runTestInXMLFile(filename);
   }
  }

 }


4. ExtactComparatorIgnoreWhiteSpace.java is to compare strings ignoring all space and ,
 
package com.lei.bigtop.hadoop.integration.test;

public class ExtactComparatorIgnoreWhiteSpace extends org.apache.hadoop.cli.util.ExactComparator {
 public boolean compare(String actual, String expected) {
  if (actual==null || expected==null) return false;
  String actual2 = actual.replaceAll("\\s", "").replaceAll(",", "");
  String expected2 = expected.replaceAll("\\s", "").replaceAll(",", "");
  return super.compare(actual2, expected2);
 }
}

5. The last is the project pom file:
 
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
  <modelVersion>4.0.0</modelVersion>

  <groupId>com.lei.bigtop</groupId>
  <artifactId>LeiBigTop</artifactId>
  <version>1.1</version>
  <packaging>jar</packaging>

  <name>LeiBigTop</name>
  <url>http://maven.apache.org</url>

  <properties>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
  </properties>

  <dependencies>

 <dependency>
        <groupId>net.sf.json-lib</groupId>
        <artifactId>json-lib</artifactId>
        <version>2.4</version>
        <classifier>jdk15</classifier>
 </dependency>


 <dependency>
  <groupId>junit</groupId>
        <artifactId>junit</artifactId>
        <version>3.8.1</version>
        <scope>test</scope>
 </dependency>
 <dependency>
  <groupId>org.apache.bigtop.itest</groupId>
  <artifactId>itest-common</artifactId>
  <version>0.3.0-incubating-SNAPSHOT</version>
 </dependency>
 <dependency>
  <groupId>org.apache.hadoop</groupId>
  <artifactId>hadoop-core</artifactId>
  <version>0.20.205.0</version>
 </dependency>

        <dependency>
                <groupId>org.apache.hadoop</groupId>
                <artifactId>hadoop-test</artifactId>
                <version>0.20.205.0</version>
        </dependency>

 <dependency>
  <groupId>org.codehaus.groovy.maven.runtime</groupId>
  <artifactId>gmaven-runtime-1.6</artifactId>
  <version>1.0</version>
 </dependency>

   <dependency>
  <groupId>org.slf4j</groupId>
  <artifactId>slf4j-log4j12</artifactId>
  <version>1.4.3</version>
     </dependency>

  </dependencies>

   <build>  

 <sourceDirectory>src</sourceDirectory>
 <directory>target</directory>

     <plugins>  
 <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
  <version>2.3.2</version>
                <configuration>
                    <source>1.6</source>
                    <target>1.6</target>
                </configuration>
 </plugin>

 <plugin>
  <groupId>org.codehaus.groovy.maven</groupId>
                <artifactId>gmaven-plugin</artifactId>
                <version>1.0</version>
                <executions>
                    <execution>
              <phase>compile</phase>
                        <goals>
                            <goal>generateStubs</goal>
                            <goal>compile</goal>
                            <goal>generateTestStubs</goal>
                            <goal>testCompile</goal>
                        </goals>
   <configuration>
                  <sources>
                       <fileset>
                           <directory>${pom.basedir}/src/com/lei/bigtop/hadoop/test</directory>
                           <includes>
                               <include>**/*.groovy</include>
                           </includes>
                       </fileset>
                  </sources>
   </configuration>
                    </execution>
                </executions>
            </plugin>


  <plugin>
        <groupId>org.codehaus.mojo</groupId>
        <artifactId>exec-maven-plugin</artifactId>
        <version>1.2.1</version>
        <executions>
          <execution>
            <phase>integration-test</phase>
            <goals>
              <goal>java</goal>
            </goals>
          </execution>
        </executions>
        <configuration>
          <mainClass>com.lei.bigtop.hadoop.test.RunHadoopTestFromXMLFile</mainClass>
                <!--
          <mainClass>com.lei.bigtop.hadoop.test.RunHadoopTestFromPropFile</mainClass>
                -->
        </configuration>
      </plugin>

     </plugins>  
    </build>  

</project>



Well, that is it. Enjoy the journey! If you want the source, send me a request, I will let you know where to get it.