对准缺失值序列 [英] Aligning sequences with missing values

查看:157
本文介绍了对准缺失值序列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用的语言是R,但你并不一定需要知道关于R来回答这个问题。

The language I'm using is R, but you don't necessarily need to know about R to answer the question.

问: 我有个序列可被认为是基础事实,和另一序列是第一的一个移位版本,与某些缺失值。我想知道如何使两者。

Question: I have a sequence that can be considered the ground truth, and another sequence that is a shifted version of the first, with some missing values. I'd like to know how to align the two.

设置

我有一个序列 ground.truth ,基本上是一组时间:

I have a sequence ground.truth that is basically a set of times:

ground.truth <- rep( seq(1,by=4,length.out=10), 5 ) +
                rep( seq(0,length.out=5,by=4*10+30), each=10 )

想想 ground.truth 随着时代在那里我做了以下内容:

Think of ground.truth as times where I'm doing the following:

{take a sample every 4 seconds for 10 times, then wait 30 seconds} x 5

我有第二个序列的意见,其中 ground.truth 移动的带缺少的值的20%:

I have a second sequence observations, which is ground.truth shifted with 20% of the values missing:

nSamples <- length(ground.truth)
idx_to_keep <- sort(sample( 1:nSamples, .8*nSamples ))
theLag <- runif(1)*100
observations <- ground.truth[idx_to_keep] + theLag
nObs     <- length(observations)

如果我绘制这些向量,这是什么样子的(记住,想到这些随着时代):

If I plot these vectors this is what it looks like (remember, think of these as times):

我已经试过。我想

  • 在计算(上述在我的例子 theLag )的转变
  • 在计算一个vector IDX ,使得 ground.truth [IDX] ==意见 - theLag
  • calculate the shift (theLag in my example above)
  • calculate a vector idx such that ground.truth[idx] == observations - theLag

首先,假设我们知道 theLag 。需要注意的是 ground.truth [1] 不一定的意见[1] -theLag 。事实上,我们有 ground.truth [1] ==意见[1 + LAGI] -theLag 一些 LAGI

First, assume we know theLag. Note that ground.truth[1] is not necessarily observations[1]-theLag. In fact, we have ground.truth[1] == observations[1+lagI]-theLag for some lagI.

要计算这个,我想我会用交叉相关( CCF 函数)。

To calculate this, I thought I'd use cross-correlation (ccf function).

但是,每当我这样做,我得到了最大滞后。互相关的0,这意味着 ground.truth [1] ==意见[1] - theLag 。但我在例子,我已经明确地尝试这样的确信的是的意见[1] - theLag 不可以 ground.truth [1] (即修改 idx_to_keep ,以确保它没有1的话)。

However, whenever I do this I get a lag with a max. cross-correlation of 0, meaning ground.truth[1] == observations[1] - theLag. But I've tried this in examples where I've explicitly made sure that observations[1] - theLag is not ground.truth[1] (i.e. modify idx_to_keep to make sure it doesn't have 1 in it).

移位 theLag 不应该影响的互相关(不是 CCF(X,Y)== CCF(X,Y -constant)?),所以我打算以后去解决它。

The shift theLag shouldn't affect the cross-correlation (isn't ccf(x,y) == ccf(x,y-constant)?) so I was going to work it out later.

也许我误解,但因为的意见没有在它尽可能多的值 ground.truth ?即使在简单的情况下,我设置 theLag == 0 ,互相关函数仍然不能识别正确的滞后性,这使我相信我在考虑这个错误

Perhaps I'm misunderstanding though, because observations doesn't have as many values in it as ground.truth? Even in the simpler case where I set theLag==0, the cross correlation function still fails to identify the correct lag, which leads me to believe I'm thinking about this wrong.

有没有人有一个通用的方法对我来说,去了解这一点,或者知道一些R里面的函数/包,可以帮助?

多谢了。

推荐答案

有关的滞后性,你可以计算你的两个点集之间的所有差异(距离):

For the lag, you can compute all the differences (distances) between your two sets of points:

diffs <- outer(observations, ground.truth, '-')

您滞后应该出现长度(观察)倍值:

which(table(diffs) == length(observations))
# 55.715382960625 
#              86 

仔细检查:

theLag
# [1] 55.71538

你问题的第二部分是容易的,一旦你找到 theLag

idx <- which(ground.truth %in% (observations - theLag))

这篇关于对准缺失值序列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆