在MapReduce作业配置中设置参数 [英] Setting parameter in MapReduce Job configuration

查看:773
本文介绍了在MapReduce作业配置中设置参数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有没有办法在作业配置中从 Mapper 设置参数,并且可以从 Reducer 中访问。



我尝试了下面的代码:

在Mapper中: map(..) context.getConfiguration()。set(Sum,100);
Reducer: reduce ..) context.getConfiguration()。get(Sum);



<但是在reducer中,value的返回值是 null



是否有任何方法可以实现这个或任何错过的东西从我身边?

解决方案

据我所知,这是不可能的。作业配置在运行时被jobtracker序列化为XML,并被复制到所有任务节点。对Configuration对象的任何更改只会影响该对象,该对象对特定任务JVM是本地的;它不会在每个节点上更改XML。



通常,您应该尽量避免任何全局状态。它违背了MapReduce范例,通常会阻止并行性。如果您绝对必须在Map和Reduce阶段之间传递信息,并且您无法通过通常的Shuffle / Sort步骤执行此操作,则可以尝试写入分布式缓存或直接写入HDFS。


Is there any way to set a parameter in job configuration from Mapper and is accessible from Reducer.

I tried the below code

In Mapper: map(..) : context.getConfiguration().set("Sum","100"); In reducer: reduce(..) : context.getConfiguration().get("Sum");

But in reducer value is returned as null.

Is there any way to implement this or any thing missed out from my side?

解决方案

As far as I know, this is not possible. The job configuration is serialized to XML at run-time by the jobtracker, and is copied out to all task nodes. Any changes to the Configuration object will only affect that object, which is local to the specific task JVM; it will not change the XML at every node.

In general, you should try to avoid any "global" state. It is against the MapReduce paradigm and will generally prevent parallelism. If you absolutely must pass information between the Map and Reduce phase, and you cannot do it via the usual Shuffle/Sort step, then you could try writing to the Distributed Cache, or directly to HDFS.

这篇关于在MapReduce作业配置中设置参数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆