使用TableMapper时的HBase Mapreduce依赖关系问题 [英] HBase Mapreduce Dependency Issue when using TableMapper
问题描述
我正在使用CDH5.3,我试图编写一个mapreduce程序来扫描一个表格并执行一些处理。
java.io.FileNotFoundException:文件不存在。我创建了一个映射器来扩展TableMapper和我得到的异常。 :hdfs:// localhost:54310 / usr / local / hadoop-2.5-cdh-3.0 / share / hadoop / common / lib / protobuf -java-2.5.0.jar
at org.apache.hadoop.hdfs .DistributedFileSystem $ 17.doCall(DistributedFileSystem.java:1093)
在org.apache.hadoop.hdfs.DistributedFileSystem $ 17.doCall(DistributedFileSystem.java:1085)
在org.apache.hadoop.fs.FileSystemLinkResolver .resolve(FileSystemLinkResolver.java:81)
在org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1085)
在org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus (ClientDistributedCacheManager.java:288)在org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)
在org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimes夯具(ClientDistributedCacheManager.java:93)
在org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57)
在org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:388)
但你可以在这里注意到它正在hdfs路径中搜索protobuf-java-2.5.0.jar,但实际上它存在于本地路径中 - /usr/local/hadoop-2.5- cdh-3.0 / share / hadoop / common / lib / protobuf-java-2.5.0.jar,我验证过。这在正常的mapreduce程序中不会发生。只有当我使用TableMapper时发生此错误。
我的驱动程序代码如下:
public class AppDriver {
public static void main(String [] args)throws Exception {
配置hbaseConfig = HBaseConfiguration.create();
hbaseConfig.set(hbase.zookeeper.quorum,PropertiesUtil.getZookeperHostName());
hbaseConfig.set(hbase.zookeeper.property.clientport,PropertiesUtil.getZookeperPortNum());
Job job = Job.getInstance(hbaseConfig,hbasemapreducejob);
job.setJarByClass(AppDriver.class);
//创建扫描
扫描扫描=新扫描();
scan.setCaching(500); // 1是Scan中的默认设置,这对于MapReduce作业会很糟糕
scan.setCacheBlocks(false); //不要为MR作业设置为true
// scan.setStartRow(Bytes.toBytes(PropertiesUtil.getHbaseStartRowkey()));
// scan.setStopRow(Bytes.toBytes(PropertiesUtil.getHbaseStopRowkey()));
TableMapReduceUtil.initTableMapperJob(PropertiesUtil.getHbaseTableName(),scan,ESportMapper.class,Text.class,RecordStatusVO.class,job);
job.setReducerClass(ESportReducer.class);
job.setNumReduceTasks(1);
TableMapReduceUtil.addDependencyJars(job);
//将结果写入输出目录中的文件
FileOutputFormat.setOutputPath(job,new Path(args [1]));
布尔型b = job.waitForCompletion(true);
if(!b){
抛出新的IOException(有错误的工作!);
}
}
我将属性文件作为参数
更多下划线信息:
我正在使用独立CDH 5.3在我的本地系统和hbase 0.98.6。
我的hbase在sudo分布式模式下运行在hdfs之上。
我的gradle.build如下:
p> apply plugin:'java'
apply plugin:'eclipse'
apply plugin:'application'
//基本属性
sourceCompatibility = 1.7
targetCompatibility ='1.7'
version ='3.0'
mainClassName =com.ESport.mapreduce.App。 AppDriver
jar {
manifest {
attributesMain-Class:$ mainClassName
}
来自{
configurations.compile.collect {it.isDirectory()?它:zipTree(它)}
}
zip64真
}
知识库{
mavenCentral()
maven {urlhttp://clojars.org/repo}
maven {urlhttp://repository.cloudera.com/artifactory/cloudera-repos/}
}
依赖关系{
testCompile组:'junit',名称:'junit',版本:'4. +'
编译组:'commons-集合',名称:'commons-collections',版本:'3.2'
编译'org.apache.storm:storm-core:0.9.4'
compile'org.apache.commons:commons- compress:1.5'
compile'org.elasticsearch:elasticsearch:1.7.1'
compile('org.apache.hadoop:hadoop-client:2.5.0-cdh5.3.0') {
exclude group:'org.slf4j'
}
compile('org.apache.hbase:hbase-client:0.98.6-cdh5.3.0'){
排除组:'org.slf4j'
排除组:'org.jruby'
排除组:'jruby-complete'
排除组:'org.codehaus.jackson'
}
编译'org.apache.hbase:hbase-common:0.98.6-cdh5.3.0'
compile'org.apache.hbase:hbase-server:0.98 .6-cdh5.3.0'
compile'org.apache.hbase:hbase-protocol:0.98.6-cdh5.3.0'
compile('com.thinkaurelius.titan:titan- core:0.5.2'){
exclude group:'org.slf4j'
}
compile('com.thinkaurelius.titan:titan-hbase:0.5.2'){
exclude group:'org.apache.hbase'
exclude group:'org.slf4j'
}
compile('com.tinkerpop.gremlin:gremlin-java:2.6.0' ){
exclude group:'org.slf4j'
}
compile'org.perf4j:perf4j:0.9.16'
compile'com.fasterxml.jackson .core:jackson-core:2.5.3'
compile'com.fasterxml.jackson.core:jackson-databind:2.5.3'
compile'com.fasterxml.jackson.core:jackson-annotations :2.5.3'
compile'com.fasterxml.jackson.dataformat:jackson-dataformat-yaml:2.1.2'
}
我正在使用这个命令来运行jar:
lockquote
hadoop jar ESportingMapReduce-3.0.jar config.properties / myoutput
如果您尝试以伪分布模式在hbase中进行安装,最可能的原因是将hadoop添加到 $ PATH
。
通过从 $ PATH
删除hadoop home,您可以以伪分布模式启动hbase。
有些人在默认情况下在 .bashrc
中添加hadoop home。
如果将它添加到.bashrc中,请将hadoop从其中移除。
I am using CDH5.3 and I am trying to write a mapreduce program to scan a table and do some proccessing. I have created a mapper which extends TableMapper and exception that i am getting is :
java.io.FileNotFoundException: File does not exist: hdfs://localhost:54310/usr/local/hadoop-2.5-cdh-3.0/share/hadoop/common/lib/protobuf-java-2.5.0.jar
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1093)
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1085)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1085)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:93)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:267)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:388)
but as you can note here it is searching for protobuf-java-2.5.0.jar in the hdfs path but actually it is present in the local path - /usr/local/hadoop-2.5-cdh-3.0/share/hadoop/common/lib/protobuf-java-2.5.0.jar , i verified . This is not happening with normal mapreduce programs . only when i am using TableMapper this error happens .
My driver code is as following :
public class AppDriver {
public static void main(String[] args) throws Exception{
Configuration hbaseConfig = HBaseConfiguration.create();
hbaseConfig.set("hbase.zookeeper.quorum", PropertiesUtil.getZookeperHostName());
hbaseConfig.set("hbase.zookeeper.property.clientport", PropertiesUtil.getZookeperPortNum());
Job job = Job.getInstance(hbaseConfig, "hbasemapreducejob");
job.setJarByClass( AppDriver.class );
// Create a scan
Scan scan = new Scan();
scan.setCaching(500); // 1 is the default in Scan, which will be bad for MapReduce jobs
scan.setCacheBlocks(false); // don't set to true for MR jobs
// scan.setStartRow(Bytes.toBytes(PropertiesUtil.getHbaseStartRowkey()));
// scan.setStopRow(Bytes.toBytes(PropertiesUtil.getHbaseStopRowkey()));
TableMapReduceUtil.initTableMapperJob(PropertiesUtil.getHbaseTableName(),scan, ESportMapper.class, Text.class, RecordStatusVO.class, job);
job.setReducerClass( ESportReducer.class );
job.setNumReduceTasks(1);
TableMapReduceUtil.addDependencyJars(job);
// Write the results to a file in the output directory
FileOutputFormat.setOutputPath( job, new Path( args[1] ));
boolean b = job.waitForCompletion(true);
if (!b) {
throw new IOException("error with job!");
}
}
I am taking properties file as args[0] .
some more underline info :
i am using standalone CDH 5.3 in my local system and hbase 0.98.6 . my hbase is running on top of hdfs in sudo distributed mode .
my gradle.build is as following :
apply plugin: 'java'
apply plugin: 'eclipse'
apply plugin: 'application'
// Basic Properties
sourceCompatibility = 1.7
targetCompatibility = '1.7'
version = '3.0'
mainClassName ="com.ESport.mapreduce.App.AppDriver"
jar {
manifest {
attributes "Main-Class": "$mainClassName"
}
from {
configurations.compile.collect { it.isDirectory() ? it : zipTree(it) }
}
zip64 true
}
repositories {
mavenCentral()
maven { url "http://clojars.org/repo" }
maven { url "http://repository.cloudera.com/artifactory/cloudera- repos/" }
}
dependencies {
testCompile group: 'junit', name: 'junit', version: '4.+'
compile group: 'commons-collections', name: 'commons-collections', version: '3.2'
compile 'org.apache.storm:storm-core:0.9.4'
compile 'org.apache.commons:commons-compress:1.5'
compile 'org.elasticsearch:elasticsearch:1.7.1'
compile('org.apache.hadoop:hadoop-client:2.5.0-cdh5.3.0'){
exclude group: 'org.slf4j'
}
compile('org.apache.hbase:hbase-client:0.98.6-cdh5.3.0') {
exclude group: 'org.slf4j'
exclude group: 'org.jruby'
exclude group: 'jruby-complete'
exclude group: 'org.codehaus.jackson'
}
compile 'org.apache.hbase:hbase-common:0.98.6-cdh5.3.0'
compile 'org.apache.hbase:hbase-server:0.98.6-cdh5.3.0'
compile 'org.apache.hbase:hbase-protocol:0.98.6-cdh5.3.0'
compile('com.thinkaurelius.titan:titan-core:0.5.2'){
exclude group: 'org.slf4j'
}
compile('com.thinkaurelius.titan:titan-hbase:0.5.2'){
exclude group: 'org.apache.hbase'
exclude group: 'org.slf4j'
}
compile('com.tinkerpop.gremlin:gremlin-java:2.6.0'){
exclude group: 'org.slf4j'
}
compile 'org.perf4j:perf4j:0.9.16'
compile 'com.fasterxml.jackson.core:jackson-core:2.5.3'
compile 'com.fasterxml.jackson.core:jackson-databind:2.5.3'
compile 'com.fasterxml.jackson.core:jackson-annotations:2.5.3'
compile 'com.fasterxml.jackson.dataformat:jackson-dataformat-yaml:2.1.2'
}
and i am using this command to run the jar :
hadoop jar ESportingMapReduce-3.0.jar config.properties /myoutput
If you are trying to setup in hbase in pseudo distributed mode, most probable reason for this adding hadoop home to $PATH
.
By just removing hadoop home from $PATH
you can start hbase in pseudo distributed mode.
Some people by default add hadoop home in .bashrc
.
If you are added it in .bashrc remove hadoop home from it.
这篇关于使用TableMapper时的HBase Mapreduce依赖关系问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!