使用R基于多个条件过滤记录的绝佳方法 [英] Elegant way to filter records based on multiple criteria using R

查看:159
本文介绍了使用R基于多个条件过滤记录的绝佳方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个如下所示的数据帧

I have a data frame like as shown below

test_df <- data.frame("subject_id" = c(1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3), 
                      "date_1" = c("01/01/2003", "12/31/2007", "12/30/2008", "12/31/2005",
                                   "01/01/2007", "01/01/2013", "12/31/2008", "03/04/2006", 
                                   "12/31/2009", "01/01/2015", "01/01/2009"))

我想做的是


  1. 为每个日期按升序排列主题(按组中的升序排列)

  1. Arrange the dates in ascending order for each subject (sort asc within groups)

根据以下条件删除每个主题的日期记录(年份无关紧要):

Remove date records for each subject based on below criteria (year doesn't matter):

2a。如果主题的第一条记录是1月1日,则仅删除12月31日的记录:subject_id = 1

2a. remove only Dec 31st records if the first record of the subject is Jan 1st ex: subject_id = 1

2b。如果主题的第一条记录是12月31日(例如:subject_id = 2

2b. remove only Jan 1st records if the first record of the subject is Dec 31st ex: subject_id = 2

2c),则仅删除1月1日的记录。如果主题在其非第一条记录(即从第二条记录到其记录结尾)中同时具有12月31日和1月1日的记录,则仅除去12月31日的记录ex:subject_id = 3

2c. remove only Dec 31st records if the subject has both Dec 31st and Jan 1st in their non-first records (meaning from 2nd record till the end of its records) ex:subject_id = 3

我正在尝试以下

sorted <- test_df %>% arrange(date_1,group_by = subject_id) #Am I right in sorts the dates within group?
test_df$month = month(test_df$date_1)  #get the month
test_df$day = day(test_df$date_1)  #get the year
filter(test_df, month==12 and day == 31)  # doesn't work here

我能根据我的条件过滤掉记录吗? ?

Can you help me with how can I filter out records based on my criteria?

我希望我的输出如下所示

I expect my output to be like as shown below

推荐答案

starting_names <- names(test_df)

test_df %>% 
  mutate(date_1 = lubridate::mdy(date_1)) %>% 
  group_by(subject_id) %>% 
  arrange() %>%
  mutate(
    without_year = format(date_1, "%m-%d"),
    first_date = first(without_year),
    has_both = all(c("01-01", "12-31") %in% tail(without_year, -1))
  ) %>%
  filter(!(first_date == "01-01" & without_year == "12-31")) %>%
  filter(!(first_date == "12-31" & without_year == "01-01")) %>%
  filter(!(first_date != "01-01" & first_date != "12-31" & has_both == TRUE & without_year == "12-31")) %>%
  select(all_of(starting_names)) %>%
  ungroup()

给出:

# A tibble: 7 x 2
  subject_id date_1    
       <dbl> <date>    
1          1 2003-01-01
2          1 2008-12-30
3          2 2005-12-31
4          2 2008-12-31
5          3 2006-03-04
6          3 2015-01-01
7          3 2009-01-01

这篇关于使用R基于多个条件过滤记录的绝佳方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆