Hadoop分布式缓存:使用-libjars:如何在代码中使用外部jar [英] Hadoop distributed cache : using -libjars : How to use external jars in your code
问题描述
好的,我能够使用ilibjars路径将外部jar添加到我的代码中. 现在如何在我的代码中使用这些外部jar.说我有一个在罐子上定义的函数,该函数对String进行操作.如何使用它. 使用context.getArchiveClassPaths(),我可以获取到它的路径,但是我不知道如何实例化该对象.
Okay I am able to add external jars to my code using ilibjars path. Now how to use those external jars in my code. say I have a function defined in that jar which operates on String. How to use it. using context.getArchiveClassPaths(), i can get a path to it but i don't know how to instantiate that object.
这是我要导入的示例jar类
here is the sample jar class that i am importing
package replace;
public class ReplacingAcronyms {
public static String Replace(String abc){
String n;
n="This is trial";
return n;
}
}
public class wc_runner extends Configured implements Tool {
@Override
public int run(String[] args) throws Exception {
Configuration conf = getConf();
Job job = new Job(new Configuration());
job.setJarByClass(wc_runner.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setMapperClass(wc_mapper.class);
job.setCombinerClass(wc_reducer.class);
job.setReducerClass(wc_reducer.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.setInputPaths(job,new Path(args[0]));
FileOutputFormat.setOutputPath(job,new Path(args[1]));
return (job.waitForCompletion(true)?0:1);
}
public static void main(String[] args) throws Exception {
int exitCode = ToolRunner.run(new wc_runner(), args);
System.exit(exitCode);
}
}
命令已运行
[training@localhost Desktop]$ export HADOOP_CLASSPATH=file:///home/training/Desktop/replace.jar
[training@localhost Desktop]$ hadoop jar try1.jar wc_runner /user/training/MR/custom/trial1 /user/training/MR/custom/out -libjars ./replace.jar
错误
14/03/08 02:39:40 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
14/03/08 02:39:41 INFO input.FileInputFormat: Total input paths to process : 1
14/03/08 02:39:41 WARN snappy.LoadSnappy: Snappy native library is available
14/03/08 02:39:41 INFO snappy.LoadSnappy: Snappy native library loaded
14/03/08 02:39:41 INFO mapred.JobClient: Running job: job_201403080114_0021
14/03/08 02:39:42 INFO mapred.JobClient: map 0% reduce 0%
14/03/08 02:39:46 INFO mapred.JobClient: Task Id : attempt_201403080114_0021_m_000000_0, Status : FAILED
Error: java.lang.ClassNotFoundException: replace.ReplacingAcronyms
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190
推荐答案
将程序包导入到映射的代码中,然后在运行映射的作业之前将jar文件的路径添加到HADOOP_CLASSPATH中.
Import your package to your mapred code then add your jar file's path in HADOOP_CLASSPATH before running your mapred job.
例如 在您的mapred Java中
E.g. In your mapred java
import your.external.package;
编译时
javac -cp /path/to/your/external/package.jar:...
运行hadoop jar时
On running the hadoop jar
export HADOOP_CLASSPATH=/path/to/your/external/package.jar
hadoop jar yourmapred.jar your.class -libjar /path/to/your/external/package.jar ....
这篇关于Hadoop分布式缓存:使用-libjars:如何在代码中使用外部jar的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!