如何在 Map/Reduce 函数中提取数据? [英] How to pull data in the Map/Reduce functions?

查看:22
本文介绍了如何在 Map/Reduce 函数中提取数据?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

根据 Hadoop:权威指南.

新的 API 支持推"和拉"两种迭代方式.在这两个 API 中,键值记录对都被推送到映射器,但此外,新 API 允许映射器从 map() 方法中提取记录.减速器也是如此.拉动"风格如何有用的一个例子是批量处理记录,而不是一个一个.

The new API supports both a "push" and a "pull" style of iteration. In both APIs, key-value record pairs are pushed to the mapper, but in addition, the new API allows a mapper to pull records from within the map() method. The same goes for the reducer. An example of how the "pull" style can be useful is processing records in batches, rather than one by one.

有人在 Map/Reduce 函数中提取过数据吗?我对 API 或相同的示例感兴趣.

Has anyone pulled data in the Map/Reduce functions? I am interested in the API or example for the same.

推荐答案

我在@mapreduce-user@hadoop.apache.org 发布了一个查询并得到了答案.

I posted a query @ mapreduce-user@hadoop.apache.org and got the answer.

通过调用nextKeyValue() 就可以了.因此,您将能够在新 API 中从中提取下一个数据.

The next key value pair can be retrieved from the context object which is passed to the map, by calling nextKeyValue() on it. So you will be able to pull the next data from it in the new API.

在这种情况下,pull 的性能是否比 push 更好?另外,pull 在哪些场景下会有用?

Is the performance of pull better than push in this scenario? Also, what are the scenarios in which the pull will be useful?

这篇关于如何在 Map/Reduce 函数中提取数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆