将MapReduce输出数据加载到HBase中 [英] Load MapReduce output data into HBase

查看:187
本文介绍了将MapReduce输出数据加载到HBase中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

最近几天我一直在试用Hadoop。我在Ubuntu 12.10上以伪分布模式运行Hadoop,并成功执行了一些标准的MapReduce作业。

接下来,我想开始尝试使用HBase。我已经安装了HBase,在shell中玩了一下。这一切都很顺利,所以我想通过一个简单的Java程序来实验HBase。我想导入以前MapReduce任务之一的输出并将其加载到HBase表中。我已经写了一个Mapper,它应该产生 HFileOutputFormat 文件,这些文件应该很容易读入HBase表。



现在,每当我运行程序(使用:hadoop jar [编译jar]),我得到一个 ClassNotFoundException 。该程序似乎无法解析 com.google.commons.primitives.Long 。当然,我认为这只是一个缺失的依赖,但JAR(谷歌的番石榴)在那里。

我尝试了很多不同的东西,但似乎找不到解决方案。

我附加了发生的Exception和最重要的类。如果有人可以帮我解决问题或给我一些建议,我会非常感激。



亲切的问候,
Pieterjan



错误

  12/12/13 09:02: 54 WARN snappy.LoadSnappy:Snappy本地库未加载
12/12/13 09:03:00信息mapred.JobClient:正在运行的作业:job_201212130304_0020
12/12/13 09:03:01信息mapred .JobClient:地图0%减少0%
12/12/13 09:04:07信息mapred.JobClient:地图100%减少0%
12/12/13 09:04:51信息mapred .JobClient:任务ID:attempt_201212130304_0020_r_000000_0,状态:FAILED
错误:java.lang.ClassNotFoundException:com.google.common.primitives.Longs $ b $ java.net.URLClassLoader $ 1.run(URLClassLoader.java:在java.net.URLClassLoader中
$ 1.run(URLClassLoader.java:355)$ java.util.AccessController.doPrivileged处的
(本地方法)$ java $ .b $ java.URLClassLoader.findClass (URLClassLoader.java:354)$ java.util.ClassLoader
。 loadClass(ClassLoader.java:423)
at sun.misc.Launcher $ AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
at org.apache.hadoop.hbase.KeyValue $ KVComparator.compare(KeyValue.java:1554)
at org.apache.hadoop.hbase.KeyValue $ KVComparator.compare(KeyValue.java:1536)$ b $ java.util.TreeMap.compare(TreeMap.java:1188)$ b $ java.util.TreeMap.put(TreeMap.java:531)
java.util.TreeSet.add(TreeSet .java:255)
at org.apache.hadoop.hbase.mapreduce.PutSortReducer.reduce(PutSortReducer.java:63)
at org.apache.hadoop.hbase.mapreduce.PutSortReducer.reduce(PutSortReducer .java:40)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:650 )
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
at org.apache.hadoop.mapred.Child $ 4.run(Child.java:255)
在java.sec (用户组方法) :1136)
at org.apache.hadoop.mapred.Child.main(Child.java:249)



JAVA

映射器:

  public class TestHBaseMapper扩展Mapper< LongWritable,Text,ImmutableBytesWritable,Put> {
$ b @Override
public void map(LongWritable key,Text value,Context context)throws IOException,InterruptedException {
//制表符分隔符\t,空格分隔符:\\ \\\s +
String [] s = value.toString()。split(\t);
Put put = new Put(s [0] .getBytes());
put.add(amount.getBytes(),value.getBytes(),value.getBytes());
context.write(new ImmutableBytesWritable(Bytes.toBytes(s [0])),put);
}

工作: $ b

  public class TestHBaseRun extends Configured implements Tool {

@Override
public int run(String [] args)throws Exception {
尝试{
配置配置= getConf();

作业hbasejob =新作业(配置);
hbasejob.setJobName(TestHBaseJob);
hbasejob.setJarByClass(TestHBaseRun.class);

//指定InputFormat和路径。
hbasejob.setInputFormatClass(TextInputFormat.class);
TextInputFormat.setInputPaths(hbasejob,new Path(/ hadoopdir / user / data / output / test /));

//设置Mapper,MapperOutputKey和MapperOutputValue类。
hbasejob.setMapperClass(TestHBaseMapper.class);
hbasejob.setMapOutputKeyClass(ImmutableBytesWritable.class);
hbasejob.setMapOutputValueClass(Put.class);

//指定OutputFormat和路径。如果路径存在,则重新初始化。
//在这种情况下,会生成可导入HBase的HFiles。
hbasejob.setOutputFormatClass(HFileOutputFormat.class);
FileSystem fs = FileSystem.get(configuration);
Path outputpath = new Path(/ hadoopdir / user / data / hbase / table /);
fs.delete(outputpath,true);
HFileOutputFormat.setOutputPath(hbasejob,outputpath);

//检查HBase中是否存在表并在必要时创建它。
HBaseUtil util = new HBaseUtil(configuration);
if(!util.exists(test)){
util.createTable(test,new String [] {amount});
}

//读取现有的(或新创建的)表格。
配置hbaseconfiguration = HBaseConfiguration.create(configuration);
HTable table = new HTable(hbaseconfiguration,test);

//将HFiles写入磁盘。自动配置分区器和减速器。
HFileOutputFormat.configureIncrementalLoad(hbasejob,table);

布尔成功= hbasejob.waitForCompletion(true);

//将生成的文件加载到表中。
LoadIncrementalHFiles加载器;
loader = new LoadIncrementalHFiles(hbaseconfiguration);
loader.doBulkLoad(outputpath,table);

返回成功? 0:1;
} catch(Exception ex){
System.out.println(Error:+ ex.getMessage());
}
返回1;


解决方案

ClassNotFoundException ,这意味着找不到包含com.google.common.primitives.Longs的所需.jar。



有几种方法可以解决此问题: / p>


  • 如果您只是使用Hadoop,解决此问题的最简单方法是将所需的.jar复制到 / usr /共享/ hadoop的/ lib中

  • 将所需.jar的路径添加到 HADOOP_CLASSPATH 。打开 /etc/hbase/hbase-env.sh 并添加:



    export HADOOP_CLASSPATH =< jar_files>:$ HADOOP_CLASSPATH


  • 在您的根项目文件夹中创建一个文件夹 / lib 。将您的.jar复制到该文件夹​​中。为您的项目创建一个包(.jar)。结果将是一个包含所有包含在 / lib 中的所有jar的胖罐子。



The last few days I've been experimenting with Hadoop. I'm running Hadoop in pseudo-distributed mode on Ubuntu 12.10 and successfully executed some standard MapReduce jobs.

Next I wanted to start experimenting with HBase. I've installed HBase, played a bit in the shell. That all went fine so I wanted to experiment with HBase through a simple Java program. I wanted to import the output of one of the previous MapReduce jobs and load it into an HBase table. I've wrote a Mapper that should produce HFileOutputFormat files that should easily read into a HBase table.

Now, whenever I run the program (using: hadoop jar [compiled jar]) I get a ClassNotFoundException. The program seems unable to resolve com.google.commons.primitives.Long. Of course, I thought it was just a dependency missing but the JAR (Google's Guava) is there.

I've tried a lot of different things but can't seem to find a solution.

I attached the Exception that occurs and the most important classes. I would be truly appreciated if someone could help me out or give me some advice on where to look.

Kind regards, Pieterjan

ERROR

12/12/13 09:02:54 WARN snappy.LoadSnappy: Snappy native library not loaded
12/12/13 09:03:00 INFO mapred.JobClient: Running job: job_201212130304_0020
12/12/13 09:03:01 INFO mapred.JobClient:  map 0% reduce 0%
12/12/13 09:04:07 INFO mapred.JobClient:  map 100% reduce 0%
12/12/13 09:04:51 INFO mapred.JobClient: Task Id : attempt_201212130304_0020_r_000000_0,Status : FAILED
Error: java.lang.ClassNotFoundException: com.google.common.primitives.Longs
    at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
    at org.apache.hadoop.hbase.KeyValue$KVComparator.compare(KeyValue.java:1554)
    at org.apache.hadoop.hbase.KeyValue$KVComparator.compare(KeyValue.java:1536)
    at java.util.TreeMap.compare(TreeMap.java:1188)
    at java.util.TreeMap.put(TreeMap.java:531)
    at java.util.TreeSet.add(TreeSet.java:255)
    at org.apache.hadoop.hbase.mapreduce.PutSortReducer.reduce(PutSortReducer.java:63)
    at org.apache.hadoop.hbase.mapreduce.PutSortReducer.reduce(PutSortReducer.java:40)
    at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
    at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:650)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1136)
    at org.apache.hadoop.mapred.Child.main(Child.java:249)

JAVA
Mapper:

public class TestHBaseMapper extends Mapper<LongWritable, Text, ImmutableBytesWritable, Put> {

@Override
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
    //Tab delimiter \t, white space delimiter: \\s+
    String[] s = value.toString().split("\t"); 
    Put put = new Put(s[0].getBytes());
    put.add("amount".getBytes(), "value".getBytes(), value.getBytes());      
    context.write(new ImmutableBytesWritable(Bytes.toBytes(s[0])), put);
}

Job:

public class TestHBaseRun extends Configured implements Tool {

@Override
public int run(String[] args) throws Exception {
    try {
        Configuration configuration = getConf();

        Job hbasejob = new Job(configuration);
        hbasejob.setJobName("TestHBaseJob");
        hbasejob.setJarByClass(TestHBaseRun.class);

        //Specifies the InputFormat and the path.
        hbasejob.setInputFormatClass(TextInputFormat.class);
        TextInputFormat.setInputPaths(hbasejob, new Path("/hadoopdir/user/data/output/test/"));

        //Set Mapper, MapperOutputKey and MapperOutputValue classes.
        hbasejob.setMapperClass(TestHBaseMapper.class);
        hbasejob.setMapOutputKeyClass(ImmutableBytesWritable.class);
        hbasejob.setMapOutputValueClass(Put.class);

        //Specifies the OutputFormat and the path. If The path exists it's reinitialized.
        //In this case HFiles, that can be imported into HBase, are produced.
        hbasejob.setOutputFormatClass(HFileOutputFormat.class);
        FileSystem fs = FileSystem.get(configuration);
        Path outputpath = new Path("/hadoopdir/user/data/hbase/table/");
        fs.delete(outputpath, true);
        HFileOutputFormat.setOutputPath(hbasejob, outputpath);

        //Check if table exists in HBase and creates it if necessary.
        HBaseUtil util = new HBaseUtil(configuration);
        if (!util.exists("test")) {
            util.createTable("test", new String[]{"amount"});
        }

        //Reads the existing (or thus newly created) table.
        Configuration hbaseconfiguration = HBaseConfiguration.create(configuration);
        HTable table = new HTable(hbaseconfiguration, "test");

        //Write HFiles to disk. Autoconfigures partitioner and reducer.
        HFileOutputFormat.configureIncrementalLoad(hbasejob, table);

        boolean success = hbasejob.waitForCompletion(true);

        //Load generated files into table.
        LoadIncrementalHFiles loader;
        loader = new LoadIncrementalHFiles(hbaseconfiguration);
        loader.doBulkLoad(outputpath, table);

        return success ? 0 : 1;
    } catch (Exception ex) {
        System.out.println("Error: " + ex.getMessage());
    }
    return 1;
}

解决方案

ClassNotFoundException, it means that the required .jar that contains com.google.common.primitives.Longs cannot be found.

There are several ways to solve this issue:

  • If you're just playing with Hadoop, the simplest way to solve this issue is to copy the required .jar into /usr/share/hadoop/lib.
  • Add the path to the required .jar to HADOOP_CLASSPATH. To do so open /etc/hbase/hbase-env.sh and add:

    export HADOOP_CLASSPATH="<jar_files>:$HADOOP_CLASSPATH"

  • Create a folder /lib in your root project folder. Copy your .jar into that folder. Create a package (.jar) for your project. The result will be a fat jar contained all the jars included in /lib.

这篇关于将MapReduce输出数据加载到HBase中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆