numpy和statsmodels在计算相关性时会给出不同的值,如何解释呢? [英] numpy and statsmodels give different values when calculating correlations, How to interpret this?
问题描述
我找不到使用numpy.correlate
计算两个系列A和B之间的相关性与使用statsmodels.tsa.stattools.ccf
I can't find a reason why calculating the correlation between two series A and B using numpy.correlate
gives me different results than the ones I obtain using statsmodels.tsa.stattools.ccf
以下是我提到的这种差异的一个示例:
Here's an example of this difference I mention:
import numpy as np
from matplotlib import pyplot as plt
from statsmodels.tsa.stattools import ccf
#Calculate correlation using numpy.correlate
def corr(x,y):
result = numpy.correlate(x, y, mode='full')
return result[result.size/2:]
#This are the data series I want to analyze
A = np.array([np.absolute(x) for x in np.arange(-1,1.1,0.1)])
B = np.array([x for x in np.arange(-1,1.1,0.1)])
#Using numpy i get this
plt.plot(corr(B,A))
#Using statsmodels i get this
plt.plot(ccf(B,A,unbiased=False))
结果似乎在质上有所不同,这种差异来自何处?
The results seem qualitatively different, where does this difference come from?
推荐答案
statsmodels.tsa.stattools.ccf
是基于np.correlate
的,但是还做了一些其他事情以统计意义而非信号处理意义给出相关性,请参见维基百科上的互相关.您可以在源代码中确切看到什么,这非常简单.
statsmodels.tsa.stattools.ccf
is based on np.correlate
but does some additional things to give the correlation in the statistical sense instead of the signal processing sense, see cross-correlation on Wikipedia. What happens exactly you can see in the source code, it's very simple.
为便于参考,我复制了以下相关行:
For easier reference I copied the relevant lines below:
def ccovf(x, y, unbiased=True, demean=True):
n = len(x)
if demean:
xo = x - x.mean()
yo = y - y.mean()
else:
xo = x
yo = y
if unbiased:
xi = np.ones(n)
d = np.correlate(xi, xi, 'full')
else:
d = n
return (np.correlate(xo, yo, 'full') / d)[n - 1:]
def ccf(x, y, unbiased=True):
cvf = ccovf(x, y, unbiased=unbiased, demean=True)
return cvf / (np.std(x) * np.std(y))
这篇关于numpy和statsmodels在计算相关性时会给出不同的值,如何解释呢?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!