使用TableMapper时的HBase Mapreduce依赖关系问题 [英] HBase Mapreduce Dependency Issue when using TableMapper

查看:781
本文介绍了使用TableMapper时的HBase Mapreduce依赖关系问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用CDH5.3,我试图编写一个mapreduce程序来扫描一个表格并执行一些处理。

  java.io.FileNotFoundException:文件不存在。我创建了一个映射器来扩展TableMapper和我得到的异常。 :hdfs:// localhost:54310 / usr / local / hadoop-2.5-cdh-3.0 / share / hadoop / common / lib / protobuf -java-2.5.0.jar 
at org.apache.hadoop.hdfs .DistributedFileSystem $ 17.doCall(DistributedFileSystem.java:1093)
在org.apache.hadoop.hdfs.DistributedFileSystem $ 17.doCall(DistributedFileSystem.java:1085)
在org.apache.hadoop.fs.FileSystemLinkResolver .resolve(FileSystemLinkResolver.java:81)
在org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1085)
在org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus (ClientDistributedCacheManager.java:288)在org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)

在org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimes夯具(ClientDistributedCacheManager.java:93)
在org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57)
在org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:388)

但你可以在这里注意到它正在hdfs路径中搜索protobuf-java-2.5.0.jar,但实际上它存在于本地路径中 - /usr/local/hadoop-2.5- cdh-3.0 / share / hadoop / common / lib / protobuf-java-2.5.0.jar,我验证过。这在正常的mapreduce程序中不会发生。只有当我使用TableMapper时发生此错误。



我的驱动程序代码如下:

  public class AppDriver { 

public static void main(String [] args)throws Exception {
配置hbaseConfig = HBaseConfiguration.create();
hbaseConfig.set(hbase.zookeeper.quorum,PropertiesUtil.getZookeperHostName());
hbaseConfig.set(hbase.zookeeper.property.clientport,PropertiesUtil.getZookeperPortNum());

Job job = Job.getInstance(hbaseConfig,hbasemapreducejob);

job.setJarByClass(AppDriver.class);

//创建扫描
扫描扫描=新扫描();

scan.setCaching(500); // 1是Scan中的默认设置,这对于MapReduce作业会很糟糕
scan.setCacheBlocks(false); //不要为MR作业设置为true
// scan.setStartRow(Bytes.toBytes(PropertiesUtil.getHbaseStartRowkey()));
// scan.setStopRow(Bytes.toBytes(PropertiesUtil.getHbaseStopRowkey()));

TableMapReduceUtil.initTableMapperJob(PropertiesUtil.getHbaseTableName(),scan,ESportMapper.class,Text.class,RecordStatusVO.class,job);
job.setReducerClass(ESportReducer.class);

job.setNumReduceTasks(1);
TableMapReduceUtil.addDependencyJars(job);

//将结果写入输出目录中的文件
FileOutputFormat.setOutputPath(job,new Path(args [1]));


布尔型b = job.waitForCompletion(true);
if(!b){
抛出新的IOException(有错误的工作!);
}

}

我将属性文件作为参数

更多下划线信息:


我正在使用独立CDH 5.3在我的本地系统和hbase 0.98.6。
我的hbase在sudo分布式模式下运行在hdfs之上。

我的gradle.build如下:

p>

  apply plugin:'java'
apply plugin:'eclipse'
apply plugin:'application'
//基本属性
sourceCompatibility = 1.7
targetCompatibility ='1.7'

version ='3.0'
mainClassName =com.ESport.mapreduce.App。 AppDriver


jar {
manifest {
attributesMain-Class:$ mainClassName
}

来自{
configurations.compile.collect {it.isDirectory()?它:zipTree(它)}
}

zip64真
}


知识库{
mavenCentral()
maven {urlhttp://clojars.org/repo}
maven {urlhttp://repository.cloudera.com/artifactory/cloudera-repos/}
}

依赖关系{

testCompile组:'junit',名称:'junit',版本:'4. +'

编译组:'commons-集合',名称:'commons-collections',版本:'3.2'
编译'org.apache.storm:storm-core:0.9.4'
compile'org.apache.commons:commons- compress:1.5'
compile'org.elasticsearch:elasticsearch:1.7.1'

compile('org.apache.hadoop:hadoop-client:2.5.0-cdh5.3.0') {
exclude group:'org.slf4j'
}
compile('org.apache.hbase:hbase-client:0.98.6-cdh5.3.0'){

排除组:'org.slf4j'
排除组:'org.jruby'
排除组:'jruby-complete'
排除组:'org.codehaus.jackson'

}

编译'org.apache.hbase:hbase-common:0.98.6-cdh5.3.0'
compile'org.apache.hbase:hbase-server:0.98 .6-cdh5.3.0'
compile'org.apache.hbase:hbase-protocol:0.98.6-cdh5.3.0'

compile('com.thinkaurelius.titan:titan- core:0.5.2'){
exclude group:'org.slf4j'
}
compile('com.thinkaurelius.titan:titan-hbase:0.5.2'){
exclude group:'org.apache.hbase'
exclude group:'org.slf4j'
}
compile('com.tinkerpop.gremlin:gremlin-java:2.6.0' ){
exclude group:'org.slf4j'
}
compile'org.perf4j:perf4j:0.9.16'

compile'c​​om.fasterxml.jackson .core:jackson-core:2.5.3'
compile'c​​om.fasterxml.jackson.core:jackson-databind:2.5.3'
compile'c​​om.fasterxml.jackson.core:jackson-annotations :2.5.3'
compile'c​​om.fasterxml.jackson.dataformat:jackson-dataformat-yaml:2.1.2'

}



我正在使用这个命令来运行jar:

lockquote
hadoop jar ESportingMapReduce-3.0.jar config.properties / myoutput

/ blockquote>

解决方案

如果您尝试以伪分布模式在hbase中进行安装,最可能的原因是将hadoop添加到 $ PATH

通过从 $ PATH 删除hadoop home,您可以以伪分布模式启动hbase。
有些人在默认情况下在 .bashrc 中添加hadoop home。

如果将它添加到.bashrc中,请将hadoop从其中移除。


I am using CDH5.3 and I am trying to write a mapreduce program to scan a table and do some proccessing. I have created a mapper which extends TableMapper and exception that i am getting is :

java.io.FileNotFoundException: File does not exist: hdfs://localhost:54310/usr/local/hadoop-2.5-cdh-3.0/share/hadoop/common/lib/protobuf-java-2.5.0.jar
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1093)
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1085)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1085)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:93)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:267)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:388)

but as you can note here it is searching for protobuf-java-2.5.0.jar in the hdfs path but actually it is present in the local path - /usr/local/hadoop-2.5-cdh-3.0/share/hadoop/common/lib/protobuf-java-2.5.0.jar , i verified . This is not happening with normal mapreduce programs . only when i am using TableMapper this error happens .

My driver code is as following :

   public class AppDriver  {

public static void main(String[] args) throws Exception{
 Configuration hbaseConfig = HBaseConfiguration.create();
    hbaseConfig.set("hbase.zookeeper.quorum", PropertiesUtil.getZookeperHostName());
    hbaseConfig.set("hbase.zookeeper.property.clientport", PropertiesUtil.getZookeperPortNum());

 Job job = Job.getInstance(hbaseConfig, "hbasemapreducejob");

    job.setJarByClass( AppDriver.class );

    // Create a scan
    Scan scan = new Scan();

    scan.setCaching(500);        // 1 is the default in Scan, which will be bad for MapReduce jobs
    scan.setCacheBlocks(false);    // don't set to true for MR jobs
    // scan.setStartRow(Bytes.toBytes(PropertiesUtil.getHbaseStartRowkey()));
    //  scan.setStopRow(Bytes.toBytes(PropertiesUtil.getHbaseStopRowkey()));

 TableMapReduceUtil.initTableMapperJob(PropertiesUtil.getHbaseTableName(),scan, ESportMapper.class, Text.class, RecordStatusVO.class, job);
    job.setReducerClass( ESportReducer.class );

    job.setNumReduceTasks(1);
    TableMapReduceUtil.addDependencyJars(job);

    // Write the results to a file in the output directory
    FileOutputFormat.setOutputPath( job, new Path( args[1] ));


   boolean b = job.waitForCompletion(true);
    if (!b) {
        throw new IOException("error with job!");
    }

}

I am taking properties file as args[0] .

some more underline info :

i am using standalone CDH 5.3 in my local system and hbase 0.98.6 . my hbase is running on top of hdfs in sudo distributed mode .

my gradle.build is as following :

apply plugin: 'java'
apply plugin: 'eclipse'
apply plugin: 'application' 
 // Basic Properties
 sourceCompatibility = 1.7
 targetCompatibility = '1.7'

 version = '3.0'
 mainClassName ="com.ESport.mapreduce.App.AppDriver"


 jar {
    manifest { 
     attributes "Main-Class": "$mainClassName"
  }  

 from {
    configurations.compile.collect { it.isDirectory() ? it :   zipTree(it) }
}

 zip64 true
}


repositories {
mavenCentral()
maven { url "http://clojars.org/repo" }
maven { url "http://repository.cloudera.com/artifactory/cloudera-  repos/" }
}

dependencies {

testCompile group: 'junit', name: 'junit', version: '4.+'

compile group: 'commons-collections', name: 'commons-collections', version: '3.2'
compile 'org.apache.storm:storm-core:0.9.4'
compile 'org.apache.commons:commons-compress:1.5'
compile 'org.elasticsearch:elasticsearch:1.7.1'

compile('org.apache.hadoop:hadoop-client:2.5.0-cdh5.3.0'){
    exclude group: 'org.slf4j'
}
compile('org.apache.hbase:hbase-client:0.98.6-cdh5.3.0') {

    exclude group: 'org.slf4j'
    exclude group: 'org.jruby'
    exclude group: 'jruby-complete'
    exclude group: 'org.codehaus.jackson'

}

compile 'org.apache.hbase:hbase-common:0.98.6-cdh5.3.0'
compile 'org.apache.hbase:hbase-server:0.98.6-cdh5.3.0'
compile 'org.apache.hbase:hbase-protocol:0.98.6-cdh5.3.0'

compile('com.thinkaurelius.titan:titan-core:0.5.2'){
    exclude group: 'org.slf4j'
}
compile('com.thinkaurelius.titan:titan-hbase:0.5.2'){
    exclude group: 'org.apache.hbase'
    exclude group: 'org.slf4j'
}
compile('com.tinkerpop.gremlin:gremlin-java:2.6.0'){
    exclude group: 'org.slf4j'
}
compile 'org.perf4j:perf4j:0.9.16'

compile 'com.fasterxml.jackson.core:jackson-core:2.5.3'
compile 'com.fasterxml.jackson.core:jackson-databind:2.5.3'
compile 'com.fasterxml.jackson.core:jackson-annotations:2.5.3'
compile 'com.fasterxml.jackson.dataformat:jackson-dataformat-yaml:2.1.2'

}

and i am using this command to run the jar :

hadoop jar ESportingMapReduce-3.0.jar config.properties /myoutput

解决方案

If you are trying to setup in hbase in pseudo distributed mode, most probable reason for this adding hadoop home to $PATH.
By just removing hadoop home from $PATH you can start hbase in pseudo distributed mode.
Some people by default add hadoop home in .bashrc.
If you are added it in .bashrc remove hadoop home from it.

这篇关于使用TableMapper时的HBase Mapreduce依赖关系问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆