hadoop,如何在尝试运行映射作业时包含3part jar [英] hadoop, how to include 3part jar while try to run mapred job

查看:81
本文介绍了hadoop,如何在尝试运行映射作业时包含3part jar的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们知道,新的需要将所有需要的类打包到job-jar中并将其上传到服务器.太慢了,我会知道是否有一种方法可以指定第三方jar包含执行map-red作业,这样我只能打包没有依赖关系的类.

As we know, new need to pack all needed class into the job-jar and upload it to server. it's so slow, i will to know whether there is a way which to specify the thirdpart jar include executing map-red job, so that i could only pack my classes with out dependencies.

PS(我发现有一个"-libjar"命令,但我不知道如何使用.这是链接

PS(i found there is a "-libjar" command, but i doesn't figure out how to use it. Here is the link http://blog.cloudera.com/blog/2011/01/how-to-include-third-party-libraries-in-your-map-reduce-job/)

推荐答案

这些被称为通用选项. 因此,为了支持这些功能,您的工作应该实现工具.

Those are called generic options. So, to support those, your job should implement Tool.

像-

hadoop jar yourfile.jar [mainClass] args -libjars <comma seperated list of jars>

要实现 Tool 并扩展 Configured ,您可以在MapReduce应用程序中执行以下操作-

To implement Tool and extend Configured, you do something like this in your MapReduce application --

public class YourClass extends Configured implements Tool {

      public static void main(String[] args) throws Exception {
         int res = ToolRunner.run(new YourClass(), args);
         System.exit(res);
      }

      public int run(String[] args) throws Exception
      {
        //parse you normal arguments here.

        Configuration conf = getConf();
        Job job = new Job(conf, "Name of job");

        //set the class names etc

        //set the output data type classes etc

        //to accept the hdfs input and outpur dir at run time
        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));

        return job.waitForCompletion(true) ? 0 : 1;
    }
}

这篇关于hadoop,如何在尝试运行映射作业时包含3part jar的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆