计算 pandas 数据帧的滚动相关性 [英] Calculating rolling correlation of pandas dataframes

查看:108
本文介绍了计算 pandas 数据帧的滚动相关性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否有可能在熊猫中使用滚动窗口和相关函数来将较短的数据帧或序列与较长的数据帧或序列进行相关,并沿较长的时间序列获得结果?基本上执行numpy.correlate方法所执行的操作,但不是进行交叉相关,而是进行成对相关.

Is it possible to use the rolling window and correlation function in pandas to do a correlation of a shorter dataframe or series to a longer one, and get the result along the longer time series? Basically doing what the numpy.correlate method does, but instead of cross-correlation, doing pairwise correlations.

x= [0,1,2,3,4,5,4,7,6,9,10,5,6,4,8,7]
y= [4,5,4,5]
print(x)
print(y)
corrs = []
for i in range(0,len(x)-3):
    corrs.append( np.corrcoef(x[i:i+4],y)[0,1] )

结果如下:

[0.4472135954999579, 0.4472135954999579, 0.4472135954999579, 0.0, 0.8164965809277259, -0.4472135954999579, 0.8320502943378437, 0.0, -0.24253562503633297, 0.24253562503633297, -0.7683498199278325, 0.8451542547285166, -0.50709255283711]

windows和逐对的每种组合都给出一系列的NAN或"ValueError:长度不匹配".在我做的简单测试案例中,它始终是NAN或单个结果,但是没有窗口.

Every combination of windows and pairwise either gives a series of NAN or a "ValueError: Length mismatch". In the simple test case I made, its always NAN or a single result, but no window.

x = pd.DataFrame(x)
y = pd.DataFrame(y)

corr = y.rolling(np.shape(y)[0]).corr(x)
print(corr)
corr = y.rolling(np.shape(x)[0]).corr(x)
print(corr)
corr = x.rolling(np.shape(x)[0]).corr(y)
print(corr)
corr = x.rolling(np.shape(y)[0]).corr(y)
print(corr)
corr = y.rolling(np.shape(y)[0]).corr(x,pairwise=True)
print(corr)
corr = y.rolling(np.shape(x)[0]).corr(x,pairwise=True)
print(corr)
corr = x.rolling(np.shape(x)[0]).corr(y,pairwise=True)
print(corr)
corr = x.rolling(np.shape(y)[0]).corr(y,pairwise=True)
print(corr)

推荐答案

使用

Use Rolling.apply with np.corrcoef or with Series.corr with same index values like y - so necessary Series.reset_index with drop=True:

x= [0,1,2,3,4,5,4,7,6,9,10,5,6,4,8,7]
y= [4,5,4,5]

corrs = []
for i in range(0,len(x)-3):
    corrs.append( np.corrcoef(x[i:i+4],y)[0,1] )

x = pd.Series(x)
y = pd.Series(y)

corr1 = x.rolling(np.shape(y)[0]).apply(lambda x: np.corrcoef(x, y)[0,1], raw=True)
corr2 = x.rolling(np.shape(y)[0]).apply(lambda x: x.reset_index(drop=True).corr(y), raw=False)


print (pd.concat([pd.Series(corrs).rename(lambda x: x + 3), corr1, corr2], axis=1))
           0         1         2
0        NaN       NaN       NaN
1        NaN       NaN       NaN
2        NaN       NaN       NaN
3   0.447214  0.447214  0.447214
4   0.447214  0.447214  0.447214
5   0.447214  0.447214  0.447214
6   0.000000  0.000000  0.000000
7   0.816497  0.816497  0.816497
8  -0.447214 -0.447214 -0.447214
9   0.832050  0.832050  0.832050
10  0.000000  0.000000  0.000000
11 -0.242536 -0.242536 -0.242536
12  0.242536  0.242536  0.242536
13 -0.768350 -0.768350 -0.768350
14  0.845154  0.845154  0.845154
15 -0.507093 -0.507093 -0.507093

这篇关于计算 pandas 数据帧的滚动相关性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆