如何计算Hadoop Map-Reduce中的一组数据的居中移动平均值? [英] How to calculate Centered Moving Average of a set of data in Hadoop Map-Reduce?
问题描述
示例输入格式:
quarter |销售额
Q1'11 | 9
Q2'11 | 8
Q3'11 | 9
Q4'11 | 12
Q1'12 | 9
Q2'12 | 12
Q3'12 | 9
Q4'12 | 10
数据表达式和移动平均数然后居中移动平均线
期间价值MA居中
1 9
1.5
2 8
2.5 9.5
3 9 9.5
3.5 9.5
4 12 10.0
4.5 10.5
5 9 10.750
5.5 11.0
6 12
6.5
7 9
我坚持使用RecordReader,它将提供一年的mapper销售价值,即四25美分硬币。
The RecordReader Problem Question线程
谢谢
这实际上在MapReduce范例中完全可行;它不一定是作为滑动窗口。请考虑一下这样一个事实,即每个数据点都与最多四次MA计算相关,并记住每次调用map函数都可以发出多个键值对。这里是伪代码:
第一个MR工作:
地图(季度,销售)
排放(季度-1.5,销售)
排放(季度-0.5,销售)
排放(季度+0.5,销售)
排放(季度+ 1.5,销售)
reduce(quarter,list_of_sales)
if(list_of_sales.length == 4):
emit(quarter,average(list_of_sales))
endif
第二MR工作:
地图(季度,MA)
散发(季度-0.5,MA)
散发(季度+0.5,MA)
reduce(quarter,list_of_MA)
if(list_of_MA.length == 2):
emit(quarter,average(list_of_sales))
endif
I want to calculate Centered Moving average of a set of Data .
Example Input format :
quarter | sales
Q1'11 | 9
Q2'11 | 8
Q3'11 | 9
Q4'11 | 12
Q1'12 | 9
Q2'12 | 12
Q3'12 | 9
Q4'12 | 10
Mathematical Representation of data and calculation of Moving average and then centered moving average
Period Value MA Centered
1 9
1.5
2 8
2.5 9.5
3 9 9.5
3.5 9.5
4 12 10.0
4.5 10.5
5 9 10.750
5.5 11.0
6 12
6.5
7 9
I am stuck with the implementation of RecordReader which will provide mapper sales value of a year i.e. of four quarter. The RecordReader Problem Question Thread Thanks
This is actually totally doable in the MapReduce paradigm; it does not have to be though of as a 'sliding window'. Instead think of the fact that each data point is relevant to a max of four MA calculations, and remember that each call to the map function can emit more than one key-value pair. Here is pseudo-code:
First MR job:
map(quarter, sales)
emit(quarter - 1.5, sales)
emit(quarter - 0.5, sales)
emit(quarter + 0.5, sales)
emit(quarter + 1.5, sales)
reduce(quarter, list_of_sales)
if (list_of_sales.length == 4):
emit(quarter, average(list_of_sales))
endif
Second MR job:
map(quarter, MA)
emit(quarter - 0.5, MA)
emit(quarter + 0.5, MA)
reduce(quarter, list_of_MA)
if (list_of_MA.length == 2):
emit(quarter, average(list_of_sales))
endif
这篇关于如何计算Hadoop Map-Reduce中的一组数据的居中移动平均值?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!