为什么NUMPY相关和corrcoef返回不同的值,以及如何“归一化"参数? “完整"关联中的模式? [英] Why NUMPY correlate and corrcoef return different values and how to "normalize" a correlate in "full" mode?

查看:402
本文介绍了为什么NUMPY相关和corrcoef返回不同的值,以及如何“归一化"参数? “完整"关联中的模式?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用Numpy在Python中使用一些时间序列分析.

I'm trying to use some Time Series Analysis in Python, using Numpy.

我有两个中等大小的序列,每个序列具有20k的值,我想检查滑动相关性.

I have two somewhat medium-sized series, with 20k values each and I want to check the sliding correlation.

corrcoef为我提供了一个自相关/相关系数矩阵作为输出.就我而言,该系列本身没有什么用,因为其中一个系列包含滞后.

The corrcoef gives me as output a Matrix of auto-correlation/correlation coefficients. Nothing useful by itself in my case, as one of the series contains a lag.

相关函数(在mode ="full"中)返回一个40k元素列表,看上去确实像我想要的结果类型(峰值距离滞后指示的位置离列表中心很远),但这些值都很奇怪-最高为500(当我期望的值为-1到1时).

The correlate function (in mode="full") returns a 40k elements list that DO look like the kind of result I'm aiming for (the peak value is as far from the center of the list as the Lag would indicate), but the values are all weird - up to 500, when I was expecting something from -1 to 1.

我不能只将其除以最大值;我知道最大相关不是1.

I can't just divide it all by the max value; I know the max correlation isn't 1.

我如何规范互相关"(全"模式下的相关),以便返回值将是每个滞后步长上的相关值,而不是那些非常大的奇异值?

How could I normalize the "cross-correlation" (correlation in "full" mode) so the return values would be the correlation on each lag step instead those very large, strange values?

推荐答案

您正在寻找归一化互相关.此选项在Numpy中尚不可用,但一个补丁正在等待能够正是您想要的.我认为,应用它应该不难.大多数补丁只是文档字符串的东西.它添加的唯一代码行是

You are looking for normalized cross-correlation. This option isn't available yet in Numpy, but a patch is waiting for review that does just what you want. It shouldn't be too hard to apply it I would think. Most of the patch is just doc string stuff. The only lines of code that it adds are

if normalize:
    a = (a - mean(a)) / (std(a) * len(a))
    v = (v - mean(v)) /  std(v)

其中a和v是要查找互相关的输入numpy数组.不难将它们添加到您自己的Numpy发行版中,或者只是复制相关函数并在其中添加行.如果我选择走这条路线,我会亲自去做.

where a and v are the inputted numpy arrays of which you are finding the cross-correlation. It shouldn't be hard to either add them into your own distribution of Numpy or just make a copy of the correlate function and add the lines there. I would do the latter personally if I chose to go this route.

另一种可能更好的替代方法是,在将输入向量发送给相关向量之前对其进行归一化处理.由您决定要采用哪种方式.

Another, quite possibly better, alternative is to just do the normalization to the input vectors before you send it to correlate. It's up to you which way you would like to do it.

顺便说一句,按照跨Wiki页的维基百科页面,这似乎是正确的规范化相关性,但除以len(a)而不是(len(a)-1).我觉得差异类似于样品的标准偏差与样品的标准偏差,实际上我认为不会有太大的改变.

By the way, this does appear to be the correct normalization as per the Wikipedia page on cross-correlation except for dividing by len(a) rather than (len(a)-1). I feel that the discrepancy is akin to the standard deviation of the sample vs. sample standard deviation and really won't make much of a difference in my opinion.

这篇关于为什么NUMPY相关和corrcoef返回不同的值,以及如何“归一化"参数? “完整"关联中的模式?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆