当上一个和下一个非NA值相等时替换NA [英] Replace NA when last and next non-NA values are equal

查看:34
本文介绍了当上一个和下一个非NA值相等时替换NA的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个带有 some 的示例表,但不是所有需要替换的 NA 值.

I have a sample table with some but not all NA values that need to be replaced.

> dat
   id message index
1   1    <NA>     1
2   1     foo     2
3   1     foo     3
4   1    <NA>     4
5   1     foo     5
6   1    <NA>     6
7   2    <NA>     1
8   2     baz     2
9   2    <NA>     3
10  2     baz     4
11  2     baz     5
12  2     baz     6
13  3     bar     1
14  3    <NA>     2
15  3    <NA>     3
16  3     bar     4
17  3    <NA>     5
18  3     bar     6
19  3    <NA>     7
20  3     qux     8

我的目标是替换被相同消息"(message)包围的 NA 值.使用消息的首次出现(最小的 index 值)和消息的最后一次出现(使用最大的 index 值)按ID

My objective is to replace the NA values that are surrounded by the same "message" using the first appearance of the message (the least index value) and the last appearance of the message (using the max index value) by id

有时,NA序列的长度仅为1,有时它们可​​能会非常长.无论如何,所有夹在中间"的 NA 都是在相同的"message"值之间 NA 之前和之后都应填写.

Sometimes, the NA sequences are only of length 1, other times they can be very long. Regardless, all of the NA's that are "sandwiched" in between the same value of "message" before and after the NA should be filled in.

上述不完整表的输出为:

The output for the above incomplete table would be:

 > output
   id message index
1   1    <NA>     1
2   1     foo     2
3   1     foo     3
4   1     foo     4
5   1     foo     5
6   1    <NA>     6
7   2    <NA>     1
8   2     baz     2
9   2     baz     3
10  2     baz     4
11  2     baz     5
12  2     baz     6
13  3     bar     1
14  3     bar     2
15  3     bar     3
16  3     bar     4
17  3     bar     5
18  3     bar     6
19  3    <NA>     7
20  3     qux     8

此处使用 data.table dplyr 的任何指南都会有所帮助,因为我什至不知道从哪里开始.

Any guidance using data.table or dplyr here would be helpful as I'm not even sure where to begin.

就我所能得到的是唯一消息的子集,但是此方法未考虑 id :

As far as I could get was subsetting by unique messages but this method does not take into account id:

#get distinct messages
messages = unique(dat$message)

#remove NA
messages = messages[!is.na(messages)]

#subset dat for each message
for (i in 1:length(messages)) {print(dat[dat$message == messages[i],]) }

数据:

 dput(dat)
structure(list(id = c(1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 
3, 3, 3, 3, 3, 3, 3), message = c(NA, "foo", "foo", NA, "foo", 
NA, NA, "baz", NA, "baz", "baz", "baz", "bar", NA, NA, "bar", 
NA, "bar", NA, "qux"), index = c(1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 
5, 6, 1, 2, 3, 4, 5, 6, 7, 8)), row.names = c(NA, -20L), class = "data.frame")

推荐答案

向前和向后执行 na.locf0 ,如果它们相同,则使用公共值;否则,请使用NA.分组是通过 ave 完成的.

Perform an na.locf0 both fowards and backwards and if they are the same then use the common value; otherwise, use NA. The grouping is done with ave.

library(zoo)

filler <- function(x) {
  forward <- na.locf0(x)
  backward <- na.locf0(x, fromLast = TRUE)
  ifelse(forward == backward, forward, NA)
}
transform(dat, message = ave(message, id, FUN = filler))

给予:

   id message index
1   1    <NA>     1
2   1     foo     2
3   1     foo     3
4   1     foo     4
5   1     foo     5
6   1    <NA>     6
7   2    <NA>     1
8   2     baz     2
9   2     baz     3
10  2     baz     4
11  2     baz     5
12  2     baz     6
13  3     bar     1
14  3     bar     2
15  3     bar     3
16  3     bar     4
17  3     bar     5
18  3     bar     6
19  3    <NA>     7
20  3     qux     8

这篇关于当上一个和下一个非NA值相等时替换NA的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆