将单个时间序列与大量时间序列相关 [英] Correlate a single time series with a large number of time series

查看:190
本文介绍了将单个时间序列与大量时间序列相关的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有大量(M)时间序列,每个时间序列都有N个时间点,并存储在MxN矩阵中.然后,我还有一个单独的具有N个时间点的时间序列,我想与矩阵中的所有时间序列相关.

I have a large number (M) of time series, each with N time points, stored in an MxN matrix. Then I also have a separate time series with N time points that I would like to correlate with all the time series in the matrix.

一个简单的解决方案是逐行遍历矩阵并运行numpy.corrcoef.但是,我想知道是否有更快或更简洁的方法?

An easy solution is to go through the matrix row by row and run numpy.corrcoef. However, I was wondering if there is a faster or more concise way to do this?

推荐答案

让我们使用以下correlation公式:

对于X,您可以将其实现为M x N数组,将Y实现为N元素的另一个单独的时间序列数组,将其与correlatedX一起使用.因此,假设XY分别为AB,矢量化的实现将类似于以下内容-

You can implement this for X as the M x N array and Y as the other separate time series array of N elements to be correlated with X. So, assuming X and Y as A and B respectively, a vectorized implementation would look something like this -

import numpy as np

# Rowwise mean of input arrays & subtract from input arrays themeselves
A_mA = A - A.mean(1)[:,None]
B_mB = B - B.mean()

# Sum of squares across rows
ssA = (A_mA**2).sum(1)
ssB = (B_mB**2).sum()

# Finally get corr coeff
out = np.dot(A_mA,B_mB.T).ravel()/np.sqrt(ssA*ssB)
# OR out = np.einsum('ij,j->i',A_mA,B_mB)/np.sqrt(ssA*ssB)

验证结果-

In [115]: A
Out[115]: 
array([[ 0.1001229 ,  0.77201334,  0.19108671,  0.83574124],
       [ 0.23873773,  0.14254842,  0.1878178 ,  0.32542199],
       [ 0.62674274,  0.42252403,  0.52145288,  0.75656695],
       [ 0.24917321,  0.73416177,  0.40779406,  0.58225605],
       [ 0.91376553,  0.37977182,  0.38417424,  0.16035635]])

In [116]: B
Out[116]: array([ 0.18675642,  0.3073746 ,  0.32381341,  0.01424491])

In [117]: out
Out[117]: array([-0.39788555, -0.95916359, -0.93824771,  0.02198139,  0.23052277])

In [118]: np.corrcoef(A[0],B), np.corrcoef(A[1],B), np.corrcoef(A[2],B)
Out[118]: 
(array([[ 1.        , -0.39788555],
       [-0.39788555,  1.        ]]),
 array([[ 1.        , -0.95916359],
       [-0.95916359,  1.        ]]),
 array([[ 1.        , -0.93824771],
       [-0.93824771,  1.        ]]))

这篇关于将单个时间序列与大量时间序列相关的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆