调用工作的区别 [英] Difference in calling the job

查看:121
本文介绍了调用工作的区别的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

main()> ToolRunner.run()调用mapreduce作业有什么区别?当我们说主类说 MapReduce扩展了Configured implements Tool 时,我们得到的额外特权是什么,如果我们只是简单地运行从主要方法的工作?感谢。

解决方案

没有额外的权限,但您的命令行选项可以通过GenericOptionsParser运行,这将允许您提取某些配置属性并从中配置一个Configuration对象:

http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/util/GenericOptionsParser.html



基本上,您可以自己解析某些选项(使用列表中参数的索引),您可以从命令行显式配置配置属性:

  hadoop jar myJar.jar com.Main prop1value prop2value 

public static void main(String args []){
配置conf = new Configuration();
conf.set(prop1,args [0]);
conf.set(prop2,args [1]);

conf.get(prop1); //将解析为prop1Value
conf.get(prop2); //将解析为prop2Value
}

使用ToolRunner变得更加紧凑:

  hadoop jar myJar.jar com.Main -Dprop1 = prop1value -Dprop2 = prop2value 

public int run (String args []){
Configuration conf = getConf();

conf.get(prop1); //将解析为prop1Value
conf.get(prop2); //会解析为prop2Value
}

使用Configuration方法getConf(),首先创建Job对象,然后将其配置取出 - Job构造函数将传入Configruation对象的副本,因此如果您更改了传入的引用,那么您的作业将不会看到更改:

pre $ public int run(String args []){
Configuration conf = getConf();

conf.set(prop3,blah);

工作职位=新职位(conf); //作业会有一个conf

conf.set(prop4,dummy)的深层拷贝; //这里我们正在修改原来的conf

job.getConfiguration()。get(prop4); //将解析为空
}


what is the difference between calling a mapreduce job from main() and from ToolRunner.run()? When we say that the main class say, MapReduce extends Configured implements Tool , what are the additional privileges we get which we do not have if we were to just make a simple run of the job from the main method? Thanks.

解决方案

There's no extra privileges, but your command line options get run via the GenericOptionsParser, which will allow you extract certain configuration properties and configure a Configuration object from it:

http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/util/GenericOptionsParser.html

Basically rather that parsing some options yourself (using the index of the argument in the list), you can explicitly configure Configuration properties from the command line:

hadoop jar myJar.jar com.Main prop1value prop2value

public static void main(String args[]) {
    Configuration conf = new Configuration();
    conf.set("prop1", args[0]);
    conf.set("prop2", args[1]);

    conf.get("prop1"); // will resolve to "prop1Value"
    conf.get("prop2"); // will resolve to "prop2Value"
}

Becomes much more condensed with ToolRunner:

hadoop jar myJar.jar com.Main -Dprop1=prop1value -Dprop2=prop2value

public int run(String args[]) {
    Configuration conf = getConf();

    conf.get("prop1"); // will resolve to "prop1Value"
    conf.get("prop2"); // will resolve to "prop2Value"
}

One final word of warning though: when using the Configuration method getConf(), create your Job object first, then pull its Configuration out - the Job constructor makes a copy of the Configruation object passed in, so if you makes changes to the reference passed in, you job will not see those changes:

public int run(String args[]) {
    Configuration conf = getConf();

    conf.set("prop3", "blah");

    Job job = new Job(conf); // job will have a deep copy of conf

    conf.set("prop4", "dummy"); // here we're amending the original conf

    job.getConfiguration().get("prop4"); // will resolve to null
}

这篇关于调用工作的区别的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆