使用 dplyr 将时间序列数据中的重复值替换为 NA [英] replace duplicate values with NA in time series data using dplyr
问题描述
我的数据似乎与其他类似类型的帖子有点不同.
My data seems a bit different than other similar kind of posts.
box_num date x y
1-Q 2018-11-18 20.2 8
1-Q 2018-11-25 21.23 7.2
1-Q 2018-12-2 21.23 23
98-L 2018-11-25 0.134 9.3
98-L 2018-12-2 0.134 4
76-GI 2018-12-2 22.734 4.562
76-GI 2018-12-9 28 4.562
在这里,我想用 NA 替换 x 和 y 列中的重复值.我尝试使用 dplyr 的代码:
Here I would like to replace the repeated values with NA in both x and y columns. The code I have tried using dplyr :
(1)df <- df %>% group_by(box_num) %>% arrange(box_num,date) %>%
mutate(df$x[duplicated(df$x),] <- NA)
它创建一个包含所有 NA 的新列,而不是仅仅用 NA 替换重复值
It creates a new column with all NA's instead of just replacing a repeated value with NA
(2)df <- df %>% group_by(box_num) %>% arrange(box_num,date) %>%
distinct(x,.keep_all = TRUE)
第二个只是给出不重复的行(我们缺少时间序列)期望的输出:
The second one just gives the rows that are not duplicated(we are missing the time series) Desired Output :
box_num date x y
1-Q 2018-11-18 20.2 8
1-Q 2018-11-25 21.23 7.2
1-Q 2018-12-2 NA 23
98-L 2018-11-25 0.134 9.3
98-L 2018-12-2 NA 4
76-GI 2018-12-2 22.734 4.562
76-GI 2018-12-9 28 NA
推荐答案
使用 dplyr
我们可以 group_by
box_num
和使用 mutate_at
x
和 y
列,并将 duplicated
值替换为 NA
.
Using dplyr
we can group_by
box_num
and use mutate_at
x
and y
column and replace the duplicated
value by NA
.
library(dplyr)
df %>%
group_by(box_num) %>%
mutate_at(vars(x:y), funs(replace(., duplicated(.), NA)))
# box_num date x y
# <fct> <fct> <dbl> <dbl>
#1 1-Q 2018-11-18 20.2 8
#2 1-Q 2018-11-25 21.2 7.2
#3 1-Q 2018-12-2 NA 23
#4 98-L 2018-11-25 0.134 9.3
#5 98-L 2018-12-2 NA 4
#6 76-GI 2018-12-2 22.7 4.56
#7 76-GI 2018-12-9 28 NA
<小时>
一个基本的 R 选项(在这种情况下可能不是最好的)是:
A base R option (which might not be the best in this case) would be :
cols <- c("x", "y")
df[cols] <- sapply(df[cols], function(x)
ave(x, df$box_num, FUN = function(x) replace(x, duplicated(x), NA)))
这篇关于使用 dplyr 将时间序列数据中的重复值替换为 NA的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!