如何在Mapper和Reducer类中共享一个变量? [英] How to share a variable in Mapper and Reducer class?

查看:89
本文介绍了如何在Mapper和Reducer类中共享一个变量?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个要求,比如我想在mapper和reducer类之间共享一个变量。
场景如下: -

假设我的输入记录是A,B和C类型的。我正在处理这些记录并相应地生成密钥并地图功能中output.collect的值。但同时我还在mapper类中声明了3个静态int变量,以保持记录A,B和C的类型的计数。
现在,这些变量将由各种地图线程更新。当所有的地图任务完成后,我想将这三个值传递给Reduce函数。



这怎么实现?我尝试重写close()方法,但是在所有地图函数执行完毕后,将在每个地图函数执行后调用它。或者有没有其他的方式来分享变量。我希望输出每种记录的总数以及我显示的任何处理输出。

解决方案

计数器在那里出于特定的原因,即。保持某些特定状态的计数,例如NUMBER_OF_RECORDS_DISCARDED。我相信只能增加这些计数器并且不会设置为任意值(我可能在这里是错误的)。但是确定它们可以用作消息传递者,但是有一个更好的方法,那就是使用作业配置来设置变量并且无缝地进行。但是这只能用于传递一个自定义消息给映射器或Reducer,并且映射器中的变化将不会在reducer中提供。



使用旧的设置消息/变量 mapred API

  JobConf job =(JobConf)getConf(); 
job.set(messageToBePassed-OR-anyValue,123-awesome-value:P);

使用新的 mapreduce API设置消息/变量:

  Configuration conf = new Configuration(); 
conf.set(messageToBePassed-OR-anyValue,123-awesome-value:P);
工作职位=新职位(conf);

使用Mapper和Reducer中的旧API获取消息/变量:
configure()必须在Mapper和Reducer类中实现,然后可以将这些值分配给类成员,以便在 map() reduce()

  ... 
private String awesomeMessage;
public void configure(JobConf job){
awesomeMessage = Long.parseLong(job.get(messageToBePassed-OR-anyValue));


变量 awesomeMessage 然后可以与map和reduce函数一起使用。



使用Mapper和Reducer中的新API获取消息/变量:
类似的事情需要在 setup()中完成。

 配置conf =上下文.getConfiguration(); 
String param = conf.get(messageToBePassed-OR-anyValue);


I have a requirement like I wanna share a variable between mapper and reducer class. Scenario is as follows:-

Suppose my input records are of type A, B and C. I'm processing these records and accordingly generating the key and value for output.collect in map function. But at the same time I've also declared 3 static int variables in mapper class to keep the count of type of record A, B and C. Now these variables will be updated by various map threads. When all the map tasks are done I wanna pass these three values to Reduce function.

How can this be achieved? I tried overriding close() method but it would be called after every map function is executed not when all the map functions are done executing. Or is there any other way to share variables. I wish to output the total count of each type of record along with whatever processed output I'm displaying.

解决方案

Counters are there for a specific reason, ie. to keep count of some specific state, for example, "NUMBER_OF_RECORDS_DISCARDED".And I believe one can only increment these counters and not set to any arbitrary value(I may be wrong here). But sure they can be used as message passers, but there is a better way, and that is to use job configuration to set a variable and seamlessly. But this can only be used to pass a custom message to mapper or reducer and the changes in mapper will not be available in reducer.

Setting the message/variable using the old mapred API

JobConf job = (JobConf) getConf();
job.set("messageToBePassed-OR-anyValue", "123-awesome-value :P");

Setting the message/variable using the new mapreduce API:

Configuration conf = new Configuration();
conf.set("messageToBePassed-OR-anyValue", "123-awesome-value :P");
Job job = new Job(conf);

Getting the message/variable using the old API in the Mapper and Reducer: The configure() has to be implemented in the Mapper and Reducer class and the values may be then assigned to a class member so as to be used inside map() or reduce().

...
private String awesomeMessage;
public void configure(JobConf job) {
    awesomeMessage = Long.parseLong(job.get("messageToBePassed-OR-anyValue"));
}
...

The variable awesomeMessage can then be used with the map and reduce functions.

Getting the message/variable using the new API in the Mapper and Reducer: Similar thing needs to be done here in the setup().

Configuration conf = context.getConfiguration();
String param = conf.get("messageToBePassed-OR-anyValue");

这篇关于如何在Mapper和Reducer类中共享一个变量?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆