hadoop -libjars和ClassNotFoundException [英] hadoop -libjars and ClassNotFoundException
问题描述
这里是我的代码来运行作业。
hadoop jar mrjob.jar ru.package.Main -files hdfs: //0.0.0.0:8020/MyCatalog/jars/metadata.csv -libjars hdfs://0.0.0.0:8020/ MyCatalog / jars / opencsv.jar,hdfs://0.0.0.0:8020/ MyCatalog / jars / gson .jar,hdfs://0.0.0.0:8020/ MyCatalog / jars / my-utils.jar /MyCatalog/http_requests.seq-r-00000/ MyCatalog / output / result_file
我得到这些警告:
12/10 / 26 18:35:50 WARN util.GenericOptionsParser:libjars文件hdfs://0.0.0.0:8020/MyCatalog/jars/opencsv.jar不在本地文件系统中。忽略。
12/10/26 18:35:50 WARN util.GenericOptionsParser:libjars文件hdfs://0.0.0.0:8020/MyCatalog/jars/gson.jar不在本地文件系统中。忽略。
12/10/26 18:35:50 WARN util.GenericOptionsParser:libjars文件hdfs://0.0.0.0:8020/ MyCatalog / jars / my-utils.jar不在本地文件系统中。忽略。
然后:线程main中的异常java.lang.NoClassDefFoundError:
在Main类中,我尝试从名为my-utils.jar的jar实例化类。
$ b $ ol <
我做错了什么?
UPD:检查 GenericOptionsParser 的源代码:
/ **
*如果libjars设置为conf,解析libjars。
* @param conf
* @return libjar url
* @throws IOException
* /
public static URL [] getLibJars(Configuration conf)throws IOException {
String jars = conf.get(tmpjars);
if(jars == null){
return null;
}
String [] files = jars.split(,);
列表<网址> cp = new ArrayList< URL>();
for(String file:files){
Path tmp = new Path(file);
if(tmp.getFileSystem(conf).equals(FileSystem.getLocal(conf))){
cp.add(FileSystem.getLocal(conf).pathToFile(tmp).toURI()。toURL ));
} else {
LOG.warn(libjars文件+ tmp +不在本地的+
文件系统中,忽略。);
}
}
返回cp.toArray(新URL [0]);
$ / code>
所以:
1.逗号之间不能有空格
2.仍然不明白...我试过指向:本地文件系统,hdfs文件系统,结果是一样的。似乎没有添加类...
问题解决了。正确的调用是:
hadoop jar my-job.jar ru.package.Main -files / home / cloudera / uploaded_jars / metadata .csv -libjars /home/cloudera/uploaded_jars/opencsv.jar,/home/cloudera/uploaded_jars/gson.jar,/home/cloudera/uploaded_jars/url-raiting-utils.jar /MyCatalog/http_requests.seq-r-00000 / MyCatalog / output / scoring_result
其中
/ MyCatalog
是hdfs路径,
< blockquote>
/ home / cloudera / uploaded_jars /
是本地fs路径
问题出现在job jar中。
之前,我曾尝试使用简单的jar 运行作业,只有三个类:Mapper,Reducer,Main class 。
现在我确实提供了另一个由maven生成的(它生成其中两个)
第二个作业jar包含所有依赖库。在里面。结构如下所示:
my-job.jar
- lib
- aopalliance-1.0.jar asm-3.2.jar avro-1.5.4.jar ... commons-beanutils-1.7.0.jar commons-beanutils-core-1.8 .0.jar ... zookeeper-3.4.3-cdh4.0.0.jar
lib文件夹中有76个jar。
它可以工作,但我不明白为什么。
please help, I'm stuck. Here is my code to run job.
hadoop jar mrjob.jar ru.package.Main -files hdfs://0.0.0.0:8020/MyCatalog/jars/metadata.csv -libjars hdfs://0.0.0.0:8020/MyCatalog/jars/opencsv.jar,hdfs://0.0.0.0:8020/MyCatalog/jars/gson.jar,hdfs://0.0.0.0:8020/MyCatalog/jars/my-utils.jar /MyCatalog/http_requests.seq-r-00000 /MyCatalog/output/result_file
I do get these WARNs:
12/10/26 18:35:50 WARN util.GenericOptionsParser: The libjars file hdfs://0.0.0.0:8020/MyCatalog/jars/opencsv.jar is not on the local filesystem. Ignoring.
12/10/26 18:35:50 WARN util.GenericOptionsParser: The libjars file hdfs://0.0.0.0:8020/MyCatalog/jars/gson.jar is not on the local filesystem. Ignoring.
12/10/26 18:35:50 WARN util.GenericOptionsParser: The libjars file hdfs://0.0.0.0:8020/MyCatalog/jars/my-utils.jar is not on the local filesystem. Ignoring.
Then: Exception in thread "main" java.lang.NoClassDefFoundError: on line in Main class where I try to instantiate class from jar named my-utils.jar
- All these jars are in hfds (I see them through file browser)
- my-utils.jar does contain class which is a reason for NoClassDefFoundError
What do I do wrong?
UPD: I'm inspecting sourcecode of GenericOptionsParser:
/**
* If libjars are set in the conf, parse the libjars.
* @param conf
* @return libjar urls
* @throws IOException
*/
public static URL[] getLibJars(Configuration conf) throws IOException {
String jars = conf.get("tmpjars");
if(jars==null) {
return null;
}
String[] files = jars.split(",");
List<URL> cp = new ArrayList<URL>();
for (String file : files) {
Path tmp = new Path(file);
if (tmp.getFileSystem(conf).equals(FileSystem.getLocal(conf))) {
cp.add(FileSystem.getLocal(conf).pathToFile(tmp).toURI().toURL());
} else {
LOG.warn("The libjars file " + tmp + " is not on the local " +
"filesystem. Ignoring.");
}
}
return cp.toArray(new URL[0]);
}
So: 1. no spaces between comma 2. still don't get it... I've tried to point to: local file system, hdfs file system, result is the same. Seems like class is not added...
Problem is solved. correct invocation is:
hadoop jar my-job.jar ru.package.Main -files /home/cloudera/uploaded_jars/metadata.csv -libjars /home/cloudera/uploaded_jars/opencsv.jar,/home/cloudera/uploaded_jars/gson.jar,/home/cloudera/uploaded_jars/url-raiting-utils.jar /MyCatalog/http_requests.seq-r-00000 /MyCatalog/output/scoring_result
where
/MyCatalog
is hdfs path,
/home/cloudera/uploaded_jars/
is local fs path The problem was in job jar. Previously I did try to run job using simple jar with only three classes: Mapper, Reducer, Main class. Now I did provide other one generated by maven (it generates two of them) The second job jar contains all dependency libs. in side it. Structure looks like: my-job.jar
-lib
--aopalliance-1.0.jar asm-3.2.jar avro-1.5.4.jar ... commons-beanutils-1.7.0.jar commons-beanutils-core-1.8.0.jar ... zookeeper-3.4.3-cdh4.0.0.jar
There are 76 jars inside lib folder.
It works but I don't understand why.
这篇关于hadoop -libjars和ClassNotFoundException的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!