当上一个和下一个非NA值相等时替换NA [英] Replace NA when last and next non-NA values are equal
问题描述
我有一个带有 some 的示例表,但不是所有需要替换的 NA
值.
I have a sample table with some but not all NA
values that need to be replaced.
> dat
id message index
1 1 <NA> 1
2 1 foo 2
3 1 foo 3
4 1 <NA> 4
5 1 foo 5
6 1 <NA> 6
7 2 <NA> 1
8 2 baz 2
9 2 <NA> 3
10 2 baz 4
11 2 baz 5
12 2 baz 6
13 3 bar 1
14 3 <NA> 2
15 3 <NA> 3
16 3 bar 4
17 3 <NA> 5
18 3 bar 6
19 3 <NA> 7
20 3 qux 8
我的目标是替换被相同消息"(message)包围的 NA
值.使用消息的首次出现(最小的 index
值)和消息的最后一次出现(使用最大的 index
值)按ID
My objective is to replace the NA
values that are surrounded by the same "message" using the first appearance of the message (the least index
value) and the last appearance of the message (using the max index
value) by id
有时,NA序列的长度仅为1,有时它们可能会非常长.无论如何,所有夹在中间"的 NA
都是在相同的"message"值之间 NA
之前和之后都应填写.
Sometimes, the NA sequences are only of length 1, other times they can be very long. Regardless, all of the NA
's that are "sandwiched" in between the same value of "message" before and after the NA
should be filled in.
上述不完整表的输出为:
The output for the above incomplete table would be:
> output
id message index
1 1 <NA> 1
2 1 foo 2
3 1 foo 3
4 1 foo 4
5 1 foo 5
6 1 <NA> 6
7 2 <NA> 1
8 2 baz 2
9 2 baz 3
10 2 baz 4
11 2 baz 5
12 2 baz 6
13 3 bar 1
14 3 bar 2
15 3 bar 3
16 3 bar 4
17 3 bar 5
18 3 bar 6
19 3 <NA> 7
20 3 qux 8
此处使用 data.table
或 dplyr
的任何指南都会有所帮助,因为我什至不知道从哪里开始.
Any guidance using data.table
or dplyr
here would be helpful as I'm not even sure where to begin.
就我所能得到的是唯一消息的子集,但是此方法未考虑 id
:
As far as I could get was subsetting by unique messages but this method does not take into account id
:
#get distinct messages
messages = unique(dat$message)
#remove NA
messages = messages[!is.na(messages)]
#subset dat for each message
for (i in 1:length(messages)) {print(dat[dat$message == messages[i],]) }
数据:
dput(dat)
structure(list(id = c(1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3,
3, 3, 3, 3, 3, 3, 3), message = c(NA, "foo", "foo", NA, "foo",
NA, NA, "baz", NA, "baz", "baz", "baz", "bar", NA, NA, "bar",
NA, "bar", NA, "qux"), index = c(1, 2, 3, 4, 5, 6, 1, 2, 3, 4,
5, 6, 1, 2, 3, 4, 5, 6, 7, 8)), row.names = c(NA, -20L), class = "data.frame")
推荐答案
向前和向后执行 na.locf0
,如果它们相同,则使用公共值;否则,请使用NA.分组是通过 ave
完成的.
Perform an na.locf0
both fowards and backwards and if they are the same then use the common value; otherwise, use NA. The grouping is done with ave
.
library(zoo)
filler <- function(x) {
forward <- na.locf0(x)
backward <- na.locf0(x, fromLast = TRUE)
ifelse(forward == backward, forward, NA)
}
transform(dat, message = ave(message, id, FUN = filler))
给予:
id message index
1 1 <NA> 1
2 1 foo 2
3 1 foo 3
4 1 foo 4
5 1 foo 5
6 1 <NA> 6
7 2 <NA> 1
8 2 baz 2
9 2 baz 3
10 2 baz 4
11 2 baz 5
12 2 baz 6
13 3 bar 1
14 3 bar 2
15 3 bar 3
16 3 bar 4
17 3 bar 5
18 3 bar 6
19 3 <NA> 7
20 3 qux 8
这篇关于当上一个和下一个非NA值相等时替换NA的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!