R-将最后一个观测值向前转发n次 [英] R -- Carry last observation forward n times
问题描述
我正在尝试将不遗漏的观测值向前移动并填充接下来的两个缺失的观测值(尽管我认为解决此问题的方法将广泛适用于将观测值向前推进n行...).
I am attempting to carry non-missing observations forward and populate the next two missing observations (although I imagine a solution to this problem would be broadly applicable to carrying observations forward through n rows...).
在下面的示例数据框中,我想将两个行中的每个id
的flag_a
和flag_b
值结转(传播).这是我的数据示例,其中包含所需的输出:
In the example data frame below I would like to carry forward (propagate) the flag_a
and flag_b
values for each id
for two rows. Here is an example of my data with the desired output included:
id <- c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2)
flag_a <- as.numeric(c(NA, NA, 1, NA, NA, NA, NA, NA, NA, NA, NA, 1, NA, NA, NA, NA, NA, NA))
flag_b <- as.numeric(c(NA, NA, NA, 1, NA, NA, NA, NA, NA, NA, NA, NA, NA, 1, NA, NA, NA, NA))
flag_a_desired_output <- as.numeric(c(NA, NA, 1, 1, 1, NA, NA, NA, NA, NA, NA, 1, 1, 1, NA, NA, NA, NA))
flag_b_desired_output <- as.numeric(c(NA, NA, NA, 1, 1, 1, NA, NA, NA, NA, NA, NA, NA, 1, 1, 1, NA, NA))
data <- data.frame(cbind(id, flag_a, flag_b, flag_a_desired_output, flag_b_desired_output))
我尝试使用以下最后一个观察结转(LOCF)功能;但是,正如预期的那样,它将填充所有缺少的行,而不是仅填充下两行.
I have attempted to use the following last observation carried forward (LOCF) function; however, as expected it populates all missing rows rather than just the next two.
na.locf.na <- function(x, na.rm = FALSE, ...) na.locf(x, na.rm = na.rm, ...)
data <- transform(data, flag_a_locf = ave(flag_a, id, FUN = na.locf.na))
data <- transform(data, flag_b_locf = ave(flag_b, id, FUN = na.locf.na))
任何对此的想法将不胜感激.
Any thoughts on how to go about this would be greatly appreciated.
推荐答案
这不是最漂亮的东西,但这是我如何处理此类问题:
This is not the prettiest thing, but here's how I handle problems like this:
library(data.table)
data <- data.table(data)
data[, rowid:=1:.N, keyby = id]
## flag_a
data[, flag_a_min:=min(rowid[!is.na(flag_a)]), keyby = id]
data[, flag_a_max:=flag_a_min+2]
data[rowid <=flag_a_max & rowid >= flag_a_min, flag_a:=min(na.omit(flag_a))]
## flag_b
data[, flag_b_min:=min(rowid[!is.na(flag_b)]), keyby = id]
data[, flag_b_max:=flag_b_min+2]
data[rowid <=flag_b_max & rowid >= flag_b_min, flag_b:=min(na.omit(flag_b))]
## clean up
data[, c("rowid", "flag_a_min", "flag_a_max", "flag_b_min", "flag_b_max"):=NULL]
> data
id flag_a flag_b flag_a_desired_output flag_b_desired_output
1: 1 NA NA NA NA
2: 1 NA NA NA NA
3: 1 1 NA 1 NA
4: 1 1 1 1 1
5: 1 1 1 1 1
6: 1 NA 1 NA 1
7: 1 NA NA NA NA
8: 1 NA NA NA NA
9: 1 NA NA NA NA
10: 1 NA NA NA NA
11: 2 NA NA NA NA
12: 2 1 NA 1 NA
13: 2 1 NA 1 NA
14: 2 1 1 1 1
15: 2 NA 1 NA 1
16: 2 NA 1 NA 1
17: 2 NA NA NA NA
18: 2 NA NA NA NA
这篇关于R-将最后一个观测值向前转发n次的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!