阿帕奇星火:均线 [英] Apache Spark: Exponential Moving Average

查看:135
本文介绍了阿帕奇星火:均线的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我写在星火/斯卡拉的应用程序中,我需要计算列的指数移动平均线。

I am writing an application in Spark/Scala in which I need to calculate the exponential moving average of a column.

EMA_t = (price_t * 0.4) + (EMA_t-1 * 0.6)

我现在面临的问题是,我需要在同一列的previously计算值(EMA_t-1)。通过MySQL的这是有可能通过使用模型或通过创建一个EMA列然后可以更新每行排,但我尝试这样做,与星火SQL或蜂巢语境......无论工作有什么办法我可以访问这EMA_t-1?

The problem I am facing is that I need the previously calculated value (EMA_t-1) of the same column. Via mySQL this would be possible by using MODEL or by creating an EMA column which you can then update row per row, but I've tried this and neither work with the Spark SQL or Hive Context... Is there any way I can access this EMA_t-1?

我的数据是这样的:

timestamp price    
15:31 132.3 
15:32 132.48 
15:33 132.76 
15:34 132.66
15:35 132.71 
15:36 132.52
15:37 132.63
15:38 132.575
15:39 132.57

所以,我需要添加一个新的列在我的第一个值就是第一行的价格,然后我会需要使用previous值:EMA_t =(price_t * 0.4)+(EMA_t-1 * 0.6)来计算该列中的下行。
我EMA列必须是:

So I would need to add a new column where my first value is just the price of the first row and then I would need to use the previous value: EMA_t = (price_t * 0.4) + (EMA_t-1 * 0.6) to calculate the following rows in that column. My EMA column would have to be:

EMA
132.3
132.372
132.5272
132.58032
132.632192
132.5873152
132.6043891
132.5926335
132.5835801

我目前正在使用它星火SQL和蜂巢,但是否可以做的另一种方式,这将是一样受欢迎呢!我也想知道我如何与星火流做到这一点。我的数据是在数据帧和我使用的Spark 1.4.1。

I am currently trying to do it using Spark SQL and Hive but if it is possible to do it in another way, this would be just as welcome! I was also wondering how I could do this with Spark Streaming. My data is in a dataframe and I am using Spark 1.4.1.

非常感谢任何帮助提供!

Thanks a lot for any help provided!

推荐答案

尝试使用窗函数。

// Define EmaFunc() 

val wSpec= Window.partitionBy("timestamp").orderBy("timestamp")

df.select(df.a, EMAFunc().over(wSpec).alias("EMA"))

这篇关于阿帕奇星火:均线的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆