自相关以numpy估计周期性 [英] Autocorrelation to estimate periodicity with numpy

查看:340
本文介绍了自相关以numpy估计周期性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有大量的时间序列(> 500),我只想选择周期性的时间序列.我做了一些文献研究,发现应该寻找自相关.使用numpy,我将自相关计算为:

I have a large set of time series (> 500), I'd like to select only the ones that are periodic. I did a bit of literature research and I found out that I should look for autocorrelation. Using numpy I calculate the autocorrelation as:

def autocorr(x):
    norm = x - np.mean(x)
    result = np.correlate(norm, norm, mode='full')
    acorr = result[result.size/2:]
    acorr /= ( x.var() * np.arange(x.size, 0, -1) )
    return acorr

这将返回一组系数(r?),当绘图时该系数应告诉我时间序列是否为周期性.

This returns a set of coefficients (r?) that when plot should tell me if the time series is periodic or not.

我生成了两个玩具示例:

I generated two toy examples:

#random signal
s1 = np.random.randint(5, size=80)
#periodic signal
s2 = np.array([5,2,3,1] * 20)

当我生成自相关图时,我得到:

When I generate the autocorrelation plots I obtain:

第二个自相关向量清楚地表明了一些周期性:

The second autocorrelation vector clearly indicates some periodicity:

Autocorr1 =  [1, 0.28, -0.06,  0.19, -0.22, -0.13,  0.07 ..]
Autocorr2 =  [1, -0.50, -0.49,  1, -0.50, -0.49,  1 ..]

我的问题是,如何从自相关向量中自动确定时间序列是否为周期性?有没有一种方法可以将这些值汇总为单个系数,例如如果= 1完美周期性,如果= 0则完全没有周期性.我试图计算平均值,但没有意义.我应该看看数字1吗?

My question is, how can I automatically determine, from the autocorrelation vector, if a time series is periodic? Is there a way to summarise the values into a single coefficient, e.g. if = 1 perfect periodicity, if = 0 no periodicity at all. I tried to calculate the mean but it is not meaningful. Should I look at the number of 1?

推荐答案

我将使用mode ='same'而不是mode ='full',因为使用mode ='full'时,我们得到极度偏移的协方差,其中只有1个数组元素与自身重叠,其余为零.这些不会很有趣.使用mode ='same'时,至少有一半的移位数组与原始数组重叠.

I would use mode='same' instead of mode='full' because with mode='full' we get covariances for extreme shifts, where just 1 array element overlaps self, the rest being zeros. Those are not going to be interesting. With mode='same' at least half of the shifted array overlaps the original one.

此外,要获得真实的相关系数(r),您需要除以重叠部分的大小,而不是除以原始x的大小. (在我的代码中,这些是np.arange(n-1, n//2, -1)).然后,每个输出将在-1和1之间.

Also, to have the true correlation coefficient (r) you need to divide by the size of the overlap, not by the size of the original x. (in my code these are np.arange(n-1, n//2, -1)). Then each of the outputs will be between -1 and 1.

Durbin–Watson统计一目了然2(1-r)表明人们认为其值小于1是自相关的重要指示,它对应于r> 0.5.这就是我在下面使用的内容.要对自相关的重要性进行统计上合理的处理,请参阅统计文献.一个起点就是为您的时间序列建立模型.

A glance at Durbin–Watson statistic, which is similar to 2(1-r), suggests that people consider its values below 1 to be a significant indication of autocorrelation, which corresponds to r > 0.5. So this is what I use below. For a statistically sound treatment of the significance of autocorrelation refer to statistics literature; a starting point would be to have a model for your time series.

def autocorr(x):
    n = x.size
    norm = (x - np.mean(x))
    result = np.correlate(norm, norm, mode='same')
    acorr = result[n//2 + 1:] / (x.var() * np.arange(n-1, n//2, -1))
    lag = np.abs(acorr).argmax() + 1
    r = acorr[lag-1]        
    if np.abs(r) > 0.5:
      print('Appears to be autocorrelated with r = {}, lag = {}'. format(r, lag))
    else: 
      print('Appears to be not autocorrelated')
    return r, lag

您的两个玩具示例的输出:

Output for your two toy examples:

似乎不是自相关的
似乎与r = 1.0,滞后= 4

Appears to be not autocorrelated
Appears to be autocorrelated with r = 1.0, lag = 4

这篇关于自相关以numpy估计周期性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆