hadoop,如何在尝试运行映射作业时包含3part jar [英] hadoop, how to include 3part jar while try to run mapred job
问题描述
我们知道,新的需要将所有需要的类打包到job-jar中并将其上传到服务器.太慢了,我会知道是否有一种方法可以指定第三方jar包含执行map-red作业,这样我只能打包没有依赖关系的类.
As we know, new need to pack all needed class into the job-jar and upload it to server. it's so slow, i will to know whether there is a way which to specify the thirdpart jar include executing map-red job, so that i could only pack my classes with out dependencies.
PS(我发现有一个"-libjar"命令,但我不知道如何使用.这是链接
PS(i found there is a "-libjar" command, but i doesn't figure out how to use it. Here is the link http://blog.cloudera.com/blog/2011/01/how-to-include-third-party-libraries-in-your-map-reduce-job/)
推荐答案
这些被称为通用选项. 因此,为了支持这些功能,您的工作应该实现工具.
Those are called generic options. So, to support those, your job should implement Tool.
像-
hadoop jar yourfile.jar [mainClass] args -libjars <comma seperated list of jars>
要实现 Tool 并扩展 Configured ,您可以在MapReduce应用程序中执行以下操作-
To implement Tool and extend Configured, you do something like this in your MapReduce application --
public class YourClass extends Configured implements Tool {
public static void main(String[] args) throws Exception {
int res = ToolRunner.run(new YourClass(), args);
System.exit(res);
}
public int run(String[] args) throws Exception
{
//parse you normal arguments here.
Configuration conf = getConf();
Job job = new Job(conf, "Name of job");
//set the class names etc
//set the output data type classes etc
//to accept the hdfs input and outpur dir at run time
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
return job.waitForCompletion(true) ? 0 : 1;
}
}
这篇关于hadoop,如何在尝试运行映射作业时包含3part jar的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!