如何计算Hadoop Map-Reduce中的一组数据的居中移动平均值? [英] How to calculate Centered Moving Average of a set of data in Hadoop Map-Reduce?

查看:256
本文介绍了如何计算Hadoop Map-Reduce中的一组数据的居中移动平均值?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



示例输入格式:

p>

  quarter |销售额
Q1'11 | 9
Q2'11 | 8
Q3'11 | 9
Q4'11 | 12
Q1'12 | 9
Q2'12 | 12
Q3'12 | 9
Q4'12 | 10

数据表达式和移动平均数然后居中移动平均线

 期间价值MA居中
1 9
1.5
2 8
2.5 9.5
3 9 9.5
3.5 9.5
4 12 10.0
4.5 10.5
5 9 10.750
5.5 11.0
6 12
6.5
7 9

我坚持使用RecordReader,它将提供一年的mapper销售价值,即四25美分硬币。
The RecordReader Problem Question线程
谢谢

解决方案

这实际上在MapReduce范例中完全可行;它不一定是作为滑动窗口。请考虑一下这样一个事实,即每个数据点都与最多四次MA计算相关,并记住每次调用map函数都可以发出多个键值对。这里是伪代码:

 第一个MR工作:

地图(季度,销售)
排放(季度-1.5,销售)
排放(季度-0.5,销售)
排放(季度+0.5,销售)
排放(季度+ 1.5,销售)

reduce(quarter,list_of_sales)
if(list_of_sales.length == 4):
emit(quarter,average(list_of_sales))
endif


第二MR工作:

地图(季度,MA)
散发(季度-0.5,MA)
散发(季度+0.5,MA)

reduce(quarter,list_of_MA)
if(list_of_MA.length == 2):
emit(quarter,average(list_of_sales))
endif


I want to calculate Centered Moving average of a set of Data .

Example Input format :

quarter | sales      
Q1'11   | 9            
Q2'11   | 8
Q3'11   | 9
Q4'11   | 12
Q1'12   | 9
Q2'12   | 12
Q3'12   | 9
Q4'12   | 10

Mathematical Representation of data and calculation of Moving average and then centered moving average

Period   Value   MA  Centered
1          9
1.5
2          8
2.5              9.5
3          9            9.5
3.5              9.5
4          12           10.0
4.5              10.5
5          9            10.750
5.5              11.0
6          12
6.5
7          9  

I am stuck with the implementation of RecordReader which will provide mapper sales value of a year i.e. of four quarter. The RecordReader Problem Question Thread Thanks

解决方案

This is actually totally doable in the MapReduce paradigm; it does not have to be though of as a 'sliding window'. Instead think of the fact that each data point is relevant to a max of four MA calculations, and remember that each call to the map function can emit more than one key-value pair. Here is pseudo-code:

First MR job:

map(quarter, sales)
    emit(quarter - 1.5, sales)
    emit(quarter - 0.5, sales)
    emit(quarter + 0.5, sales)
    emit(quarter + 1.5, sales)

reduce(quarter, list_of_sales)
    if (list_of_sales.length == 4):
        emit(quarter, average(list_of_sales))
    endif


Second MR job:

map(quarter, MA)
    emit(quarter - 0.5, MA)
    emit(quarter + 0.5, MA)

reduce(quarter, list_of_MA)
    if (list_of_MA.length == 2):
        emit(quarter, average(list_of_sales))
    endif

这篇关于如何计算Hadoop Map-Reduce中的一组数据的居中移动平均值?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆