pandas ewm.std计算 [英] pandas ewm.std calculation

查看:129
本文介绍了 pandas ewm.std计算的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试验证熊猫的ewm.std计算,以便可以为我的代码实现一步更新.这是代码问题的完整描述.

I am trying to verify the ewm.std calculations of pandas so that I can implement a one step update for my code. Here is the complete description of the problem with code.

mrt = pd.Series(np.random.randn(1000))
N = 100
a = 2/(1+N)
bias = (2-a)/2/(1-a)
x = mrt.iloc[-2]
ma = mrt.ewm(span=N).mean().iloc[-3]
var = mrt.ewm(span=N).var().iloc[-3]
ans = mrt.ewm(span=N).std().iloc[-2]
print(np.sqrt( bias*(1-a) * (var + a * (x- ma)**2)), ans)

(1.1352524643949702,1.1436193844674576)

(1.1352524643949702, 1.1436193844674576)

我用的是标准配方.有人可以告诉我为什么两个值不应该相同吗?即大熊猫如何计算指数加权标准差?

I have used standard formulation. Could somebody tell me why the two values should not be same? i.e. how is pandas calculating the exponentially weighted std?

在朱利安回答之后-让我再给出一个用例.我正在绘制由熊猫计算的var的比率,并使用从熊猫ewm-协方差的Cython代码推断出的公式.该比率应为1.(我猜想我的公式有问题,如果有人可以指出的话).

After Julien's answer - let me give one more use case. I am plotting the ratio of the var calculated by pandas and using the formula I inferred from the Cython code of pandas ewm-covariance. This ratio should be 1. (I am guessing there is a problem with my formula, if somebody can point it out).

mrt = pd.Series(np.random.randn(1000))

N = 100
a = 2./(1+N)
bias = (2-a)/2./(1-a)
mewma = mrt.ewm(span=N).mean()

var_pandas = mrt.ewm(span=N).var()
var_calculated = bias * (1-a) * (var_pandas.shift(1) + a * (mrt-mewma.shift(1))**2)

(var_calculated/var_pandas).plot()

该图清楚地表明了问题所在.

The plot shows the problem clearly.

通过反复试验,我找出了正确的公式:

EDIT 2: By trial and error, I figured out the right formula:

var_calculated = (1-a) * (var_pandas.shift(1) + bias * a * (mrt-mewma.shift(1))**2)

但是我不认为这应该是正确的选择!有人可以照亮吗?

But I am not convinced that it should be the right one! Can somebody put light on that?

推荐答案

您的问题实际上归结为熊猫如何计算ewm.var()

Your question actually actually reduces to how pandas calculate ewm.var()

In [1]:
(np.sqrt(mrt.ewm(span=span).var()) == mrt.ewm(span=span).std())[1:].value_counts()

Out[1]:
True    999
dtype: int64

所以在上面的示例中:ans == np.sqrt(mrt.ewm(span=N).var().iloc[-2]).

So in your example above: ans == np.sqrt(mrt.ewm(span=N).var().iloc[-2]).

要调查其如何计算ewmvar(),请调用 emcov input_x=input_y=mrt

To investigate how it calculates ewmvar(), it does it by calling emcov with input_x=input_y=mrt

如果我们检查第一个元素:

If we check for the first elements:

mrt.ewm(span=span).var()[:2].values
> array([nan,  0.00555309])

现在,使用emcov例程,并将其应用于我们的特定情况:

Now, using the emcov routine, and applying it to our specific case:

x0 = mrt.iloc[0]
x1 = mrt.iloc[1]
x2 = mrt.iloc[2]

# mean_x and mean_y are both the same, here we call it y
# This is the same as mrt.ewm(span=span).mean(), I verified that too
y0 = x0
# y1 = mrt.ewm(span=span).mean().iloc[1]
y1 = ((1-alpha)*y0 + x1)/(1+(1-alpha))
#y2 = (((1-alpha)**2+(1-alpha))*y1 + x2) / (1 + (1-alpha) + (1-alpha)**2) 

cov0 = 0

cov1 = (((1-alpha) * (cov0 + ((y0 - y1)**2))) +
                (1 * ((x1 - y1)**2))) / (1 + (1-alpha))

# new_wt = 1, sum_wt0 = (1-alpha), sum_wt2 = (1-alpha)**2
sum_wt = 1+(1-alpha)
sum_wt2 =1+(1-alpha)**2


numerator = sum_wt * sum_wt # (1+(1-alpha))^2 = 1 + 2(1-alpha) + (1-alpha)^2
denominator = numerator - sum_wt2 # # 2*(1-alpha)


print(np.nan,cov1*(numerator / denominator))

>(nan, 0.0055530905712123432)

这篇关于 pandas ewm.std计算的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆