调用工作的区别 [英] Difference in calling the job
问题描述
从 main()
和 MapReduce扩展了Configured implements Tool
时,我们得到的额外特权是什么,如果我们只是简单地运行从主要方法的工作?感谢。
没有额外的权限,但您的命令行选项可以通过GenericOptionsParser运行,这将允许您提取某些配置属性并从中配置一个Configuration对象:
http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/util/GenericOptionsParser.html
基本上,您可以自己解析某些选项(使用列表中参数的索引),您可以从命令行显式配置配置属性:
hadoop jar myJar.jar com.Main prop1value prop2value
public static void main(String args []){
配置conf = new Configuration();
conf.set(prop1,args [0]);
conf.set(prop2,args [1]);
conf.get(prop1); //将解析为prop1Value
conf.get(prop2); //将解析为prop2Value
}
使用ToolRunner变得更加紧凑:
hadoop jar myJar.jar com.Main -Dprop1 = prop1value -Dprop2 = prop2value
public int run (String args []){
Configuration conf = getConf();
conf.get(prop1); //将解析为prop1Value
conf.get(prop2); //会解析为prop2Value
}
使用Configuration方法getConf(),首先创建Job对象,然后将其配置取出 - Job构造函数将传入Configruation对象的副本,因此如果您更改了传入的引用,那么您的作业将不会看到更改:
pre $ public int run(String args []){
Configuration conf = getConf();
conf.set(prop3,blah);
工作职位=新职位(conf); //作业会有一个conf
conf.set(prop4,dummy)的深层拷贝; //这里我们正在修改原来的conf
job.getConfiguration()。get(prop4); //将解析为空
}
what is the difference between calling a mapreduce job from main()
and from ToolRunner.run()
? When we say that the main class say, MapReduce extends Configured implements Tool
, what are the additional privileges we get which we do not have if we were to just make a simple run of the job from the main method? Thanks.
There's no extra privileges, but your command line options get run via the GenericOptionsParser, which will allow you extract certain configuration properties and configure a Configuration object from it:
http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/util/GenericOptionsParser.html
Basically rather that parsing some options yourself (using the index of the argument in the list), you can explicitly configure Configuration properties from the command line:
hadoop jar myJar.jar com.Main prop1value prop2value
public static void main(String args[]) {
Configuration conf = new Configuration();
conf.set("prop1", args[0]);
conf.set("prop2", args[1]);
conf.get("prop1"); // will resolve to "prop1Value"
conf.get("prop2"); // will resolve to "prop2Value"
}
Becomes much more condensed with ToolRunner:
hadoop jar myJar.jar com.Main -Dprop1=prop1value -Dprop2=prop2value
public int run(String args[]) {
Configuration conf = getConf();
conf.get("prop1"); // will resolve to "prop1Value"
conf.get("prop2"); // will resolve to "prop2Value"
}
One final word of warning though: when using the Configuration method getConf(), create your Job object first, then pull its Configuration out - the Job constructor makes a copy of the Configruation object passed in, so if you makes changes to the reference passed in, you job will not see those changes:
public int run(String args[]) {
Configuration conf = getConf();
conf.set("prop3", "blah");
Job job = new Job(conf); // job will have a deep copy of conf
conf.set("prop4", "dummy"); // here we're amending the original conf
job.getConfiguration().get("prop4"); // will resolve to null
}
这篇关于调用工作的区别的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!