R:有效地定位与输入段具有最大互相关的时间序列段? [英] R: Efficiently locating time series segments with maximal cross-correlation to input segment?

查看:12
本文介绍了R:有效地定位与输入段具有最大互相关的时间序列段?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有大约 200,000 行的长数字时间序列数据(我们称之为 Z).

I have a long numerical time series data of approximately 200,000 rows (lets call it Z).

在一个循环中,我一次从 Z 子集 x(大约 30 个)连续行并将它们视为查询点 q.

In a loop, I subset x (about 30) consecutive rows from Z at a time and treat them as the query point q.

我想在 Z 内定位 y (~300) 个最相关的时间序列片段,长度为 x(与 q 最相关).

I want to locate within Z the y (~300) most correlated time series segments of length x (most correlated with q).

什么是完成此任务的有效方法?

What is an efficient way to accomplish this?

推荐答案

下面的代码找到你正在寻找的 300 个段,并在我不太强大的 Windows 笔记本电脑上运行 8 秒,所以它应该足够快以满足你的目的.

The code below finds the 300 segments you are looking for and runs in 8 seconds on my none too powerful Windows laptop, so it should be fast enough for your purposes.

首先,它构造一个 30×199971 矩阵 (Zmat),其列包含您要检查的所有长度为 30 的时间序列段".对 cor() 的一次调用,对向量 q 和矩阵 Zmat 进行操作,然后计算所有所需的相关系数.最后,检查结果向量以识别具有最高相关系数的 300 个序列.

First, it constructs a 30-by-199971 matrix (Zmat), whose columns contain all of the length-30 "time series segments" you want to examine. A single call to cor(), operating on the vector q and the matrix Zmat, then calculates all of the desired correlation coefficients. Finally, the resultant vector is examined to identify the 300 sequences having the highest correlation coefficients.

# Simulate data
nZ <- 200000
nq <- 30
Z <- rnorm(nZ)
q <- seq_len(nq)

# From Z, construct a 30 by 199971 matrix, in which each column is a
# "time series segment". Column 1 contains observations 1:30, column 2
# contains observations 2:31, and so on through the end of the series.
Zmat <- sapply(seq_len(nZ - nq + 1),  
               FUN = function(X) Z[seq(from = X, length.out = nq)])

# Calculate the correlation of q with every column/"time series segment.
Cors <- cor(q, Zmat)

# Extract the starting position of the 300 most highly correlated segments    
ids <- order(Cors, decreasing=TRUE)[1:300]

# Maybe try something like the following to confirm that you have
# selected the most highly correlated segments.
hist(Cors, breaks=100)
hist(Cors[ids], col="red", add=TRUE)

这篇关于R:有效地定位与输入段具有最大互相关的时间序列段?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆