R:有效地定位具有最大互相关性的时间序列段与输入段? [英] R: Efficiently locating time series segments with maximal cross-correlation to input segment?
问题描述
我有大约200,000行的长数值时间序列数据(称为 Z )。
I have a long numerical time series data of approximately 200,000 rows (lets call it Z).
在一个循环中,我子集一次 x (约30条)来自 Z 的连续行,并将其作为查询点 q 。
In a loop, I subset x (about 30) consecutive rows from Z at a time and treat them as the query point q.
我想在 y (约300个)相关性最强的时间序列分段中找到 Z / em>长度为 x (与 q 最为相关)。
I want to locate within Z the y (~300) most correlated time series segments of length x (most correlated with q).
什么是有效的方法?
推荐答案
下面的代码找到您要查找的300个细分,并在8秒钟内在我功能不太强大的Windows笔记本电脑上运行,因此它应该足够快以满足您的目的。
The code below finds the 300 segments you are looking for and runs in 8 seconds on my none too powerful Windows laptop, so it should be fast enough for your purposes.
首先,它构造一个30×199971矩阵( Zmat
),其列包含所有长度-30要检查的时间序列段。一次调用 cor()
,在向量 q
和矩阵 Zmat
,然后计算所有所需的相关系数。最后,检查所得向量以识别具有最高相关系数的300个序列。
First, it constructs a 30-by-199971 matrix (Zmat
), whose columns contain all of the length-30 "time series segments" you want to examine. A single call to cor()
, operating on the vector q
and the matrix Zmat
, then calculates all of the desired correlation coefficients. Finally, the resultant vector is examined to identify the 300 sequences having the highest correlation coefficients.
# Simulate data
nZ <- 200000
nq <- 30
Z <- rnorm(nZ)
q <- seq_len(nq)
# From Z, construct a 30 by 199971 matrix, in which each column is a
# "time series segment". Column 1 contains observations 1:30, column 2
# contains observations 2:31, and so on through the end of the series.
Zmat <- sapply(seq_len(nZ - nq + 1),
FUN = function(X) Z[seq(from = X, length.out = nq)])
# Calculate the correlation of q with every column/"time series segment.
Cors <- cor(q, Zmat)
# Extract the starting position of the 300 most highly correlated segments
ids <- order(Cors, decreasing=TRUE)[1:300]
# Maybe try something like the following to confirm that you have
# selected the most highly correlated segments.
hist(Cors, breaks=100)
hist(Cors[ids], col="red", add=TRUE)
这篇关于R:有效地定位具有最大互相关性的时间序列段与输入段?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!