matplotlib的plt.acorr中的自相关图的错误? [英] bug of autocorrelation plot in matplotlib‘s plt.acorr?

查看:343
本文介绍了matplotlib的plt.acorr中的自相关图的错误?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在绘制与python自相关的图.我用三种方法来做到这一点:1.熊猫,2. matplotlib,3. statsmodels.我发现我从matplotlib获得的图与其他两个图不一致.代码是:

I am plotting autocorrelation with python. I used three ways to do it: 1. pandas, 2. matplotlib, 3. statsmodels. I found the graph I got from matplotlib is not consistent with the other two. The code is:

 from statsmodels.graphics.tsaplots import *
 # print out data
 print mydata.values

 #1. pandas
 p=autocorrelation_plot(mydata)
 plt.title('mydata')

 #2. matplotlib
 fig=plt.figure()
 plt.acorr(mydata,maxlags=150)
 plt.title('mydata')

 #3. statsmodels.graphics.tsaplots.plot_acf
 plot_acf(mydata)
 plt.title('mydata')

图形在此处: http://quant365.com/viewtopic.php ?f = 4& t = 33

推荐答案

这是统计信息和信号处理之间不同的通用定义的结果.基本上,信号处理定义假定您将要处理下降趋势.统计定义假定减去平均值就是您将要进行的所有去趋势处理,并且会为您完成.

This is a result of different common definitions between statistics and signal processing. Basically, the signal processing definition assumes that you're going to handle the detrending. The statistical definition assumes that subtracting the mean is all the detrending you'll do, and does it for you.

首先,让我们用一个独立的示例来演示该问题:

First off, let's demonstrate the problem with a stand-alone example:

import numpy as np
import matplotlib.pyplot as plt

import pandas as pd
from statsmodels.graphics import tsaplots

def label(ax, string):
    ax.annotate(string, (1, 1), xytext=(-8, -8), ha='right', va='top',
                size=14, xycoords='axes fraction', textcoords='offset points')

np.random.seed(1977)
data = np.random.normal(0, 1, 100).cumsum()

fig, axes = plt.subplots(nrows=4, figsize=(8, 12))
fig.tight_layout()

axes[0].plot(data)
label(axes[0], 'Raw Data')

axes[1].acorr(data, maxlags=data.size-1)
label(axes[1], 'Matplotlib Autocorrelation')

tsaplots.plot_acf(data, axes[2])
label(axes[2], 'Statsmodels Autocorrelation')

pd.tools.plotting.autocorrelation_plot(data, ax=axes[3])
label(axes[3], 'Pandas Autocorrelation')

# Remove some of the titles and labels that were automatically added
for ax in axes.flat:
    ax.set(title='', xlabel='')
plt.show()

那么,为什么我要说它们都是正确的呢?他们显然是不同的!

So, why the heck am I saying that they're all correct? They're clearly different!

让我们编写我们自己的自相关函数以演示plt.acorr在做什么:

Let's write our own autocorrelation function to demonstrate what plt.acorr is doing:

def acorr(x, ax=None):
    if ax is None:
        ax = plt.gca()
    autocorr = np.correlate(x, x, mode='full')
    autocorr /= autocorr.max()

    return ax.stem(autocorr)

如果用我们的数据进行绘制,我们将得到与plt.acorr大致相同的结果(我只是因为懒惰而没有适当地标记滞后):

If we plot this with our data, we'll get a more-or-less identical result to plt.acorr (I'm leaving out properly labeling the lags, simply because I'm lazy):

fig, ax = plt.subplots()
acorr(data)
plt.show()

这是一个完全有效的自相关.这完全取决于您的背景是信号处理还是统计.

This is a perfectly valid autocorrelation. It's all a matter of whether your background is signal processing or statistics.

这是信号处理中使用的定义.假设您将要处理数据的去趋势处理(请注意plt.acorr中的detrend kwarg).如果您希望将其去趋势化,则会明确要求它(并且可能做得比仅仅减去平均值还要做些更好的事情),否则不应假定它.

This is the definition used in signal processing. The assumption is that you're going to handle detrending your data (note the detrend kwarg in plt.acorr). If you want it detrended, you'll explictly ask for it (and probably do something better than just subtracting the mean), and otherwise it shouldn't be assumed.

在统计数据中,简单地减去均值即是您要进行趋势去除时要执行的操作.

In statistics, simply subtracting the mean is assumed to be what you wanted to do for detrending.

所有其他函数都将在相关之前减去数据的平均值,如下所示:

All of the other functions are subtracting the mean of the data before the correlation, similar to this:

def acorr(x, ax=None):
    if ax is None:
        ax = plt.gca()

    x = x - x.mean()

    autocorr = np.correlate(x, x, mode='full')
    autocorr /= autocorr.max()

    return ax.stem(autocorr)

fig, ax = plt.subplots()
acorr(data)
plt.show()

但是,我们仍然有很大的不同.这纯粹是一种绘图惯例.

However, we still have one large difference. This one is purely a plotting convention.

在大多数信号处理教科书中(无论如何,我都已经看到),显示了完全"自相关,因此零延迟位于中间,并且结果在每一侧都是对称的.另一方面,R具有非常合理的约定以仅显示其一侧. (毕竟,另一端是完全多余的.)统计绘图函数遵循R对流,而plt.acorr遵循Matlab所做的,这是相反的约定.

In most signal processing textbooks (that I've seen, anyway), the "full" autocorrelation is displayed, such that zero lag is in the center, and the result is symmetric on each side. R, on the other hand, has the very reasonable convention to display only one side of it. (After all, the other side is completely redundant.) The statistical plotting functions follow the R convetion, and plt.acorr follows what Matlab does, which is the opposite convention.

基本上,您需要这样做:

Basically, you'd want this:

def acorr(x, ax=None):
    if ax is None:
        ax = plt.gca()

    x = x - x.mean()

    autocorr = np.correlate(x, x, mode='full')
    autocorr = autocorr[x.size:]
    autocorr /= autocorr.max()

    return ax.stem(autocorr)

fig, ax = plt.subplots()
acorr(data)
plt.show()

这篇关于matplotlib的plt.acorr中的自相关图的错误?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆