检查时间序列不一致 [英] Check time series incongruencies

查看:98
本文介绍了检查时间序列不一致的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我们具有以下矩阵:

Let's say that we have the following matrix:

x<- as.data.frame(cbind(c("A","A","A","B","B","B","B","B","C","C","C","C","C","D","D","D","D","D"),
                        c(1,2,3,1,2,3,4,5,1,2,3,4,5,1,2,3,4,5),
                        c(14,28,42,14,46,64,71,85,14,28,51,84,66,22,38,32,40,42)))
colnames(x)<- c("ID","Visit", "Age")

第一列代表受试者ID,第二列代表观察值列表,第三列代表每次连续观察的年龄。

The first column represents subject ID, the second a list of observations and the third the age at each of this consecutive observations.

根据先前的访问年龄,这将是找到年龄错误的访问的最简单方法。 (即,在第13行中,受试者C为66岁,而在上次访问中他已经84岁,在第16行中,受试者D为32岁,而在先前的访问中,他已经38岁。)

Which would be the easiest way of finding visits where the age is wrong according to the previous visit age. (i.e. in row 13, subject C is 66 years old, when in the previous visit he was already 84 or in row 16, subject D is 32 years old, when in the previous visit he was already 38).

哪种方法可以突出显示潜在的错误并删除第13和16行?

Which would be the way of highlighting the potential errors and removing rows 13 and 16?

我试图通过ID和寻找每次访问之间的年龄差异,但对我来说似乎很难,因为每次访问都可能发生错误。

I have tried to aggregate by IDs and look for the difference between ages across visits, but it seems hard for me since the error could occur in any visit.

推荐答案

如何

df <- do.call(rbind.data.frame, lapply(split(x, x$ID), function(w) 
    w[c(1, which(diff(w[order(w$Visit), "Age"]) > 0) + 1), ]));
df;
#    ID Visit Age
#A.1   A     1  14
#A.2   A     2  28
#A.3   A     3  42
#B.4   B     1  14
#B.5   B     2  46
#B.6   B     3  64
#B.7   B     4  71
#B.8   B     5  85
#C.9   C     1  14
#C.10  C     2  28
#C.11  C     3  51
#C.12  C     4  84
#D.14  D     1  22
#D.15  D     2  38
#D.17  D     4  40
#D.18  D     5  42    

说明:我们在 ID 列上拆分数据帧,然后对每个 dataframe 子集按 Visit 排序,计算连续的 Age 值,并且仅保留相差大于0的行(即 Age 在增加); rbind 给出最终的 dataframe

Explanation: We split the dataframe on column ID, then order every dataframe subset by Visit, calculate differences between successive Age values, and only keep those rows where the difference is > 0 (i.e. Age is increasing); rbinding gives the final dataframe.

这篇关于检查时间序列不一致的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆