pandas ACF和statsmodel ACF有什么区别? [英] What's the difference between pandas ACF and statsmodel ACF?

查看:401
本文介绍了pandas ACF和statsmodel ACF有什么区别?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在计算股票收益的自相关函数.为此,我测试了两个函数,Pandas内置的autocorr函数和statsmodels.tsa提供的acf函数.这是通过以下MWE完成的:

I'm calculating the Autocorrelation Function for a stock's returns. To do so I tested two functions, the autocorr function built into Pandas, and the acf function supplied by statsmodels.tsa. This is done in the following MWE:

import pandas as pd
from pandas_datareader import data
import matplotlib.pyplot as plt
import datetime
from dateutil.relativedelta import relativedelta
from statsmodels.tsa.stattools import acf, pacf

ticker = 'AAPL'
time_ago = datetime.datetime.today().date() - relativedelta(months = 6)

ticker_data = data.get_data_yahoo(ticker, time_ago)['Adj Close'].pct_change().dropna()
ticker_data_len = len(ticker_data)

ticker_data_acf_1 =  acf(ticker_data)[1:32]
ticker_data_acf_2 = [ticker_data.autocorr(i) for i in range(1,32)]

test_df = pd.DataFrame([ticker_data_acf_1, ticker_data_acf_2]).T
test_df.columns = ['Pandas Autocorr', 'Statsmodels Autocorr']
test_df.index += 1
test_df.plot(kind='bar')

我注意到他们预测的值并不相同:

What I noticed was the values they predicted weren't identical:

是什么造成了这种差异,应该使用哪些值?

What accounts for this difference, and which values should be used?

推荐答案

Pandas和Statsmodels版本之间的差异在于均值减法和归一化/方差除法:

The difference between the Pandas and Statsmodels version lie in the mean subtraction and normalization / variance division:

  • autocorr只不过将原始系列的子系列传递给np.corrcoef.在该方法内部,使用这些子系列的样本均值和样本方差确定相关系数
  • 相反,
  • acf使用总体系列样本均值和样本方差来确定相关系数.
  • autocorr does nothing more than passing subseries of the original series to np.corrcoef. Inside this method, the sample mean and sample variance of these subseries are used to determine the correlation coefficient
  • acf, in contrary, uses the overall series sample mean and sample variance to determine the correlation coefficient.

对于较长的时间序列,差异可能会变小,但对于较短的时间序列,差异会很大.

The differences may get smaller for longer time series but are quite big for short ones.

与Matlab相比,Pandas autocorr函数可能对应于使用(滞后)序列本身对Matlabs xcorr(交叉校正)进行处理,而不是Matlab的autocorr用于计算样本自相关(根据docs;由于无法访问Matlab,因此无法验证.

Compared to Matlab, the Pandas autocorr function probably corresponds to doing Matlabs xcorr (cross-corr) with the (lagged) series itself, instead of Matlab's autocorr, which calculates the sample autocorrelation (guessing from the docs; I cannot validate this because I have no access to Matlab).

请参阅此MWE进行澄清:

See this MWE for clarification:

import numpy as np
import pandas as pd
from statsmodels.tsa.stattools import acf
import matplotlib.pyplot as plt
plt.style.use("seaborn-colorblind")

def autocorr_by_hand(x, lag):
    # Slice the relevant subseries based on the lag
    y1 = x[:(len(x)-lag)]
    y2 = x[lag:]
    # Subtract the subseries means
    sum_product = np.sum((y1-np.mean(y1))*(y2-np.mean(y2)))
    # Normalize with the subseries stds
    return sum_product / ((len(x) - lag) * np.std(y1) * np.std(y2))

def acf_by_hand(x, lag):
    # Slice the relevant subseries based on the lag
    y1 = x[:(len(x)-lag)]
    y2 = x[lag:]
    # Subtract the mean of the whole series x to calculate Cov
    sum_product = np.sum((y1-np.mean(x))*(y2-np.mean(x)))
    # Normalize with var of whole series
    return sum_product / ((len(x) - lag) * np.var(x))

x = np.linspace(0,100,101)

results = {}
nlags=10
results["acf_by_hand"] = [acf_by_hand(x, lag) for lag in range(nlags)]
results["autocorr_by_hand"] = [autocorr_by_hand(x, lag) for lag in range(nlags)]
results["autocorr"] = [pd.Series(x).autocorr(lag) for lag in range(nlags)]
results["acf"] = acf(x, unbiased=True, nlags=nlags-1)

pd.DataFrame(results).plot(kind="bar", figsize=(10,5), grid=True)
plt.xlabel("lag")
plt.ylim([-1.2, 1.2])
plt.ylabel("value")
plt.show()

Statsmodels使用np.correlate对此进行优化,但这基本上是它的工作方式.

Statsmodels uses np.correlate to optimize this, but this is basically how it works.

这篇关于pandas ACF和statsmodel ACF有什么区别?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆