在MapReduce中聚合 [英] Aggregation in MapReduce
问题描述
我们如何在.csv中找到列的最大和最小元素。
我们应该将哪些内容传递给映射器的context.write(key,value) 。
- 是否是该csv文件的每一列?
这对于SO问题有点宽泛,但我会咬人。
映射器用于将值映射到键。假设您的CSV包含4列数字值:
42,71,45,22
blockquote>
您将键映射到每个值;有效的是什么会像CSV中的标题。假设第4列代表小部件数量。您可以将number_of_widgets作为关键字映射到映射器中第4列的值。
缩减器将获得给定键的所有值。 那就是,你可以找出你的最小/最大值。您只需遍历该键的所有值并记录最小值和最大值。
How can we find tha maximum and minimum element of a column in a .csv.
What should we pass into context.write(key,value) of mapper.
- Whether it is each column of that csv file?
解决方案This is a bit broad for an SO question but I'll bite.
Your mapper is for mapping values to keys. Lets say your CSV has 4 columns with numeric values:
42, 71, 45, 22
You map a key to each value; effectively what would be like the header in the CSV. Lets say column 4 represented "Number of widgets". You'd map "number_of_widgets" as the key to the value of column 4 in your mapper.
The reducer is going to get all the values for a given key. That's where you figure out your min/max. You just iterate though all the values for the key and keep track of the min and max.
这篇关于在MapReduce中聚合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!