计算 pandas 数据帧的滚动相关性 [英] Calculating rolling correlation of pandas dataframes
问题描述
是否有可能在熊猫中使用滚动窗口和相关函数来将较短的数据帧或序列与较长的数据帧或序列进行相关,并沿较长的时间序列获得结果?基本上执行numpy.correlate方法所执行的操作,但不是进行交叉相关,而是进行成对相关.
Is it possible to use the rolling window and correlation function in pandas to do a correlation of a shorter dataframe or series to a longer one, and get the result along the longer time series? Basically doing what the numpy.correlate method does, but instead of cross-correlation, doing pairwise correlations.
x= [0,1,2,3,4,5,4,7,6,9,10,5,6,4,8,7]
y= [4,5,4,5]
print(x)
print(y)
corrs = []
for i in range(0,len(x)-3):
corrs.append( np.corrcoef(x[i:i+4],y)[0,1] )
结果如下:
[0.4472135954999579, 0.4472135954999579, 0.4472135954999579, 0.0, 0.8164965809277259, -0.4472135954999579, 0.8320502943378437, 0.0, -0.24253562503633297, 0.24253562503633297, -0.7683498199278325, 0.8451542547285166, -0.50709255283711]
windows和逐对的每种组合都给出一系列的NAN或"ValueError:长度不匹配".在我做的简单测试案例中,它始终是NAN或单个结果,但是没有窗口.
Every combination of windows and pairwise either gives a series of NAN or a "ValueError: Length mismatch". In the simple test case I made, its always NAN or a single result, but no window.
x = pd.DataFrame(x)
y = pd.DataFrame(y)
corr = y.rolling(np.shape(y)[0]).corr(x)
print(corr)
corr = y.rolling(np.shape(x)[0]).corr(x)
print(corr)
corr = x.rolling(np.shape(x)[0]).corr(y)
print(corr)
corr = x.rolling(np.shape(y)[0]).corr(y)
print(corr)
corr = y.rolling(np.shape(y)[0]).corr(x,pairwise=True)
print(corr)
corr = y.rolling(np.shape(x)[0]).corr(x,pairwise=True)
print(corr)
corr = x.rolling(np.shape(x)[0]).corr(y,pairwise=True)
print(corr)
corr = x.rolling(np.shape(y)[0]).corr(y,pairwise=True)
print(corr)
推荐答案
使用 Series.corr
具有与y
相同的索引值-因此有必要
Use Rolling.apply
with np.corrcoef
or with Series.corr
with same index values like y
- so necessary Series.reset_index
with drop=True
:
x= [0,1,2,3,4,5,4,7,6,9,10,5,6,4,8,7]
y= [4,5,4,5]
corrs = []
for i in range(0,len(x)-3):
corrs.append( np.corrcoef(x[i:i+4],y)[0,1] )
x = pd.Series(x)
y = pd.Series(y)
corr1 = x.rolling(np.shape(y)[0]).apply(lambda x: np.corrcoef(x, y)[0,1], raw=True)
corr2 = x.rolling(np.shape(y)[0]).apply(lambda x: x.reset_index(drop=True).corr(y), raw=False)
print (pd.concat([pd.Series(corrs).rename(lambda x: x + 3), corr1, corr2], axis=1))
0 1 2
0 NaN NaN NaN
1 NaN NaN NaN
2 NaN NaN NaN
3 0.447214 0.447214 0.447214
4 0.447214 0.447214 0.447214
5 0.447214 0.447214 0.447214
6 0.000000 0.000000 0.000000
7 0.816497 0.816497 0.816497
8 -0.447214 -0.447214 -0.447214
9 0.832050 0.832050 0.832050
10 0.000000 0.000000 0.000000
11 -0.242536 -0.242536 -0.242536
12 0.242536 0.242536 0.242536
13 -0.768350 -0.768350 -0.768350
14 0.845154 0.845154 0.845154
15 -0.507093 -0.507093 -0.507093
这篇关于计算 pandas 数据帧的滚动相关性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!