在 MapReduce Job 配置中设置参数 [英] Setting parameter in MapReduce Job configuration

查看:15
本文介绍了在 MapReduce Job 配置中设置参数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有什么方法可以从 Mapper 设置作业配置中的参数,并且可以从 Reducer 访问.

Is there any way to set a parameter in job configuration from Mapper and is accessible from Reducer.

我试过下面的代码

在映射器中:map(..) : context.getConfiguration().set("Sum","100");在 reducer 中: reduce(..) : context.getConfiguration().get("Sum");

In Mapper: map(..) : context.getConfiguration().set("Sum","100"); In reducer: reduce(..) : context.getConfiguration().get("Sum");

但在 reducer 中,值返回为 null.

But in reducer value is returned as null.

有什么方法可以实现这个或我遗漏的任何事情吗?

Is there any way to implement this or any thing missed out from my side?

推荐答案

据我所知,这是不可能的.作业配置在运行时由作业跟踪器序列化为 XML,并复制到所有任务节点.对 Configuration 对象的任何更改只会影响该对象,该对象是特定任务 JVM 的本地对象;它不会更改每个节点的 XML.

As far as I know, this is not possible. The job configuration is serialized to XML at run-time by the jobtracker, and is copied out to all task nodes. Any changes to the Configuration object will only affect that object, which is local to the specific task JVM; it will not change the XML at every node.

一般来说,您应该尽量避免任何全局"状态.它违反 MapReduce 范式,通常会阻止并行性.如果您绝对必须在 Map 和 Reduce 阶段之间传递信息,而您无法通过通常的 Shuffle/Sort 步骤来完成,那么您可以尝试写入分布式缓存,或直接写入 HDFS.

In general, you should try to avoid any "global" state. It is against the MapReduce paradigm and will generally prevent parallelism. If you absolutely must pass information between the Map and Reduce phase, and you cannot do it via the usual Shuffle/Sort step, then you could try writing to the Distributed Cache, or directly to HDFS.

这篇关于在 MapReduce Job 配置中设置参数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆