计算地图中的中位数减少 [英] Computing median in map reduce

查看:37
本文介绍了计算地图中的中位数减少的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有人可以举例说明地图减少中位数/分位数的计算吗?

Can someone example the computation of median/quantiles in map reduce?

我对 Datafu 中位数的理解是,'n' 个映射器对数据并将数据发送到负责排序的1"reducer来自 n 个映射器的所有数据并找到中值(中间值)我的理解正确吗?,

My understanding of Datafu's median is that the 'n' mappers sort the data and send the data to "1" reducer which is responsible for sorting all the data from n mappers and finding the median(middle value) Is my understanding correct?,

如果是这样,这种方法是否适用于海量数据,因为我可以清楚地看到一个减速器努力完成最后的任务.谢谢

if so, does this approach scale for massive amounts of data as i can clearly see the one single reducer struggling to do the final task. Thanks

推荐答案

试图找到一个系列中的中位数(中间数)将需要将 1 个 reducer 传递给整个数字范围以确定哪个是中间数"' 值.

Trying to find the median (middle number) in a series is going to require that 1 reducer is passed the entire range of numbers to determine which is the 'middle' value.

根据输入集中值的范围和唯一性,您可以引入组合器来输出每个值的频率 - 减少发送到单个减速器的映射输出数量.然后,您的 reducer 可以使用排序值/频率对来识别中位数.

Depending on the range and uniqueness of values in your input set, you could introduce a combiner to output the frequency of each value - reducing the number of map outputs sent to your single reducer. Your reducer can then consume the sort value / frequency pairs to identify the median.

您可以扩展的另一种方法(如果您知道值的范围和粗略分布)是使用自定义分区器,该分区器按范围桶分配键(0-99 到减速器 0,100-199 到减速器 2, 等等).然而,这将需要一些辅助工作来检查减速器输出并执行最终的中值计算(例如知道每个减速器中的键数,您可以计算出哪个减速器输出将包含中值,以及在哪个偏移量处)

Another way you could scale this (again if you know the range and rough distribution of values) is to use a custom partitioner that distributes the keys by range buckets (0-99 go to reducer 0, 100-199 to reducer 2, and so on). This will however require some secondary job to examine the reducer outputs and perform the final median calculation (knowing for example the number of keys in each reducer, you can calculate which reducer output will contain the median, and at which offset)

这篇关于计算地图中的中位数减少的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆