在MapReduce中聚合 [英] Aggregation in MapReduce

查看:473
本文介绍了在MapReduce中聚合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们如何在.csv中找到列的最大和最小元素。



我们应该将哪些内容传递给映射器的context.write(key,value) 。


  1. 是否是该csv文件的每一列?

解决方案

解决方案

这对于SO问题有点宽泛,但我会咬人。

映射器用于将值映射到键。假设您的CSV包含4列数字值:


42,71,45,22

blockquote>

您将键映射到每个值;有效的是什么会像CSV中的标题。假设第4列代表小部件数量。您可以将number_of_widgets作为关键字映射到映射器中第4列的值。



缩减器将获得给定键的所有值。 那就是,你可以找出你的最小/最大值。您只需遍历该键的所有值并记录最小值和最大值。

How can we find tha maximum and minimum element of a column in a .csv.

What should we pass into context.write(key,value) of mapper.

  1. Whether it is each column of that csv file?

Solution

解决方案

This is a bit broad for an SO question but I'll bite.

Your mapper is for mapping values to keys. Lets say your CSV has 4 columns with numeric values:

42, 71, 45, 22

You map a key to each value; effectively what would be like the header in the CSV. Lets say column 4 represented "Number of widgets". You'd map "number_of_widgets" as the key to the value of column 4 in your mapper.

The reducer is going to get all the values for a given key. That's where you figure out your min/max. You just iterate though all the values for the key and keep track of the min and max.

这篇关于在MapReduce中聚合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆