如果同一时期有高质量数据,则删除质量数据不佳的列 [英] Remove a column with bad quality data if there is good quality data for the same period
问题描述
我遇到以下问题:我的数据包含好坏数据.所以在2017年12月31日的时间里,我有一列包含质量值为800的高质量数据(Quality = a)
和质量为劣质的数据(Quality = b)
的列值750.
I have the following problem: My data contains good and bad quality data. So e.g. For the time 2017-12-31, I have a column with good quality data (Quality = a)
with the value 800 and bad quality data (Quality = b)
with the value 750.
Quality Time Value
1 a 2017-12-31 800
2 a 2018-12-31 500
3 b 2017-12-31 750
4 b 2018-12-31 480
5 b 2019-12-31 200
样本数据框:
df <- data.frame(Quality = c("a", "a", "b", "b", "b"), Time = c("2017-12-31", "2018-12-31", "2017-12-31", "2018-12-31", "2019-12-31"), Value = c(800, 500, 750, 480, 200))
我想保持质量差",数据(Quality = b)
仅当不存在好质量"数据时,每个时间段(时间)
的数据(质量= a)
.
I want to keep the "bad quality" data (Quality = b)
only when there is no "good quality" data (Quality = a)
for each period (Time)
.
因此,预期输出为:
Quality Time Value
1 a 2017-12-31 800
2 a 2018-12-31 500
3 b 2019-12-31 200
我试图用if语句解决此问题,但失败了.我的真实数据有10000多行和多个日期.感谢您的帮助.
I tried to solve this problem with an if statement, but failed. My real data has over 10000 rows and multiple dates. Any help is appreciated.
推荐答案
您可以在 match
的帮助下完成此操作:
You can do this with the help of match
:
library(dplyr)
df %>%
group_by(Time) %>%
slice(first(na.omit(match(c('a', 'b'), Quality)))) %>%
ungroup
# Quality Time Value
# <chr> <chr> <dbl>
#1 a 2017-12-31 800
#2 a 2018-12-31 500
#3 b 2019-12-31 200
这篇关于如果同一时期有高质量数据,则删除质量数据不佳的列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!