如果同一时期有高质量数据,则删除质量数据不佳的列 [英] Remove a column with bad quality data if there is good quality data for the same period

查看:48
本文介绍了如果同一时期有高质量数据,则删除质量数据不佳的列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我遇到以下问题:我的数据包含好坏数据.所以在2017年12月31日的时间里,我有一列包含质量值为800的高质量数据(Quality = a)和质量为劣质的数据(Quality = b)的列值750.

I have the following problem: My data contains good and bad quality data. So e.g. For the time 2017-12-31, I have a column with good quality data (Quality = a) with the value 800 and bad quality data (Quality = b) with the value 750.

  Quality       Time Value
1       a 2017-12-31   800
2       a 2018-12-31   500
3       b 2017-12-31   750
4       b 2018-12-31   480
5       b 2019-12-31   200

样本数据框:

df <- data.frame(Quality = c("a", "a", "b", "b", "b"), Time = c("2017-12-31", "2018-12-31", "2017-12-31", "2018-12-31", "2019-12-31"), Value = c(800, 500, 750, 480, 200))

我想保持质量差",数据(Quality = b)仅当不存在好质量"数据时,每个时间段(时间)的数据(质量= a).

I want to keep the "bad quality" data (Quality = b) only when there is no "good quality" data (Quality = a) for each period (Time).

因此,预期输出为:

  Quality       Time Value
1       a 2017-12-31   800
2       a 2018-12-31   500
3       b 2019-12-31   200

我试图用if语句解决此问题,但失败了.我的真实数据有10000多行和多个日期.感谢您的帮助.

I tried to solve this problem with an if statement, but failed. My real data has over 10000 rows and multiple dates. Any help is appreciated.

推荐答案

您可以在 match 的帮助下完成此操作:

You can do this with the help of match :

library(dplyr)

df %>%
  group_by(Time) %>%
  slice(first(na.omit(match(c('a', 'b'), Quality)))) %>%
  ungroup

#  Quality Time       Value
#  <chr>   <chr>      <dbl>
#1 a       2017-12-31   800
#2 a       2018-12-31   500
#3 b       2019-12-31   200

这篇关于如果同一时期有高质量数据,则删除质量数据不佳的列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆