使用R基于多个条件过滤记录的绝佳方法 [英] Elegant way to filter records based on multiple criteria using R
问题描述
我有一个如下所示的数据帧
I have a data frame like as shown below
test_df <- data.frame("subject_id" = c(1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3),
"date_1" = c("01/01/2003", "12/31/2007", "12/30/2008", "12/31/2005",
"01/01/2007", "01/01/2013", "12/31/2008", "03/04/2006",
"12/31/2009", "01/01/2015", "01/01/2009"))
我想做的是
-
为每个日期按升序排列主题(按组中的升序排列)
Arrange the dates in ascending order for each subject (sort asc within groups)
根据以下条件删除每个主题的日期记录(年份无关紧要):
Remove date records for each subject based on below criteria (year doesn't matter):
2a。如果主题的第一条记录是1月1日,则仅删除12月31日的记录:subject_id = 1
2a. remove only Dec 31st records if the first record of the subject is Jan 1st ex: subject_id = 1
2b。如果主题的第一条记录是12月31日(例如:subject_id = 2
2b. remove only Jan 1st records if the first record of the subject is Dec 31st ex: subject_id = 2
2c),则仅删除1月1日的记录。如果主题在其非第一条记录(即从第二条记录到其记录结尾)中同时具有12月31日和1月1日的记录,则仅除去12月31日的记录ex:subject_id = 3
2c. remove only Dec 31st records if the subject has both Dec 31st and Jan 1st in their non-first records (meaning from 2nd record till the end of its records) ex:subject_id = 3
我正在尝试以下
sorted <- test_df %>% arrange(date_1,group_by = subject_id) #Am I right in sorts the dates within group?
test_df$month = month(test_df$date_1) #get the month
test_df$day = day(test_df$date_1) #get the year
filter(test_df, month==12 and day == 31) # doesn't work here
我能根据我的条件过滤掉记录吗? ?
Can you help me with how can I filter out records based on my criteria?
我希望我的输出如下所示
I expect my output to be like as shown below
推荐答案
starting_names <- names(test_df)
test_df %>%
mutate(date_1 = lubridate::mdy(date_1)) %>%
group_by(subject_id) %>%
arrange() %>%
mutate(
without_year = format(date_1, "%m-%d"),
first_date = first(without_year),
has_both = all(c("01-01", "12-31") %in% tail(without_year, -1))
) %>%
filter(!(first_date == "01-01" & without_year == "12-31")) %>%
filter(!(first_date == "12-31" & without_year == "01-01")) %>%
filter(!(first_date != "01-01" & first_date != "12-31" & has_both == TRUE & without_year == "12-31")) %>%
select(all_of(starting_names)) %>%
ungroup()
给出:
# A tibble: 7 x 2
subject_id date_1
<dbl> <date>
1 1 2003-01-01
2 1 2008-12-30
3 2 2005-12-31
4 2 2008-12-31
5 3 2006-03-04
6 3 2015-01-01
7 3 2009-01-01
这篇关于使用R基于多个条件过滤记录的绝佳方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!