使用dplyr在时间序列数据中用NA替换重复值 [英] replace duplicate values with NA in time series data using dplyr
问题描述
我的数据似乎与其他类似的帖子有所不同.
My data seems a bit different than other similar kind of posts.
box_num date x y
1-Q 2018-11-18 20.2 8
1-Q 2018-11-25 21.23 7.2
1-Q 2018-12-2 21.23 23
98-L 2018-11-25 0.134 9.3
98-L 2018-12-2 0.134 4
76-GI 2018-12-2 22.734 4.562
76-GI 2018-12-9 28 4.562
在这里,我想将x和y列中的重复值替换为NA. 我尝试使用dplyr的代码:
Here I would like to replace the repeated values with NA in both x and y columns. The code I have tried using dplyr :
(1)df <- df %>% group_by(box_num) %>% arrange(box_num,date) %>%
mutate(df$x[duplicated(df$x),] <- NA)
它将创建一个具有所有NA的新列,而不是仅将重复的值替换为NA
It creates a new column with all NA's instead of just replacing a repeated value with NA
(2)df <- df %>% group_by(box_num) %>% arrange(box_num,date) %>%
distinct(x,.keep_all = TRUE)
第二个只是给出了不重复的行(我们错过了时间序列) 所需的输出:
The second one just gives the rows that are not duplicated(we are missing the time series) Desired Output :
box_num date x y
1-Q 2018-11-18 20.2 8
1-Q 2018-11-25 21.23 7.2
1-Q 2018-12-2 NA 23
98-L 2018-11-25 0.134 9.3
98-L 2018-12-2 NA 4
76-GI 2018-12-2 22.734 4.562
76-GI 2018-12-9 28 NA
推荐答案
使用dplyr
,我们可以group_by
box_num
并使用mutate_at
x
和y
列,并将duplicated
值替换为NA
.
Using dplyr
we can group_by
box_num
and use mutate_at
x
and y
column and replace the duplicated
value by NA
.
library(dplyr)
df %>%
group_by(box_num) %>%
mutate_at(vars(x:y), funs(replace(., duplicated(.), NA)))
# box_num date x y
# <fct> <fct> <dbl> <dbl>
#1 1-Q 2018-11-18 20.2 8
#2 1-Q 2018-11-25 21.2 7.2
#3 1-Q 2018-12-2 NA 23
#4 98-L 2018-11-25 0.134 9.3
#5 98-L 2018-12-2 NA 4
#6 76-GI 2018-12-2 22.7 4.56
#7 76-GI 2018-12-9 28 NA
R的基本选项(在这种情况下可能不是最好的)是:
A base R option (which might not be the best in this case) would be :
cols <- c("x", "y")
df[cols] <- sapply(df[cols], function(x)
ave(x, df$box_num, FUN = function(x) replace(x, duplicated(x), NA)))
这篇关于使用dplyr在时间序列数据中用NA替换重复值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!