如何根据2列中的值给出的日期范围删除行? [英] How do I remove rows based on a range of dates given by values in 2 columns?
问题描述
我有一个包含日期范围的数据集,需要在新行中填写缺少的日期. df1
是我正在使用的数据的一个示例,而 df2
是我已成功实现的一个示例(卡住了). df3
是我想要结束的地方!
I have a data set that includes a range of dates and need to fill in the missing dates in new rows. df1
is an example of the data I am working with and df2
is an example of what I've managed to achieve (where I'm stuck). df3
is where I would like to end up!
df1
ID Date DateStart DateEnd
1 2/11/2021 2/11/2021 2/17/2021
1 2/19/2021 2/19/2021 2/21/2021
2 1/15/2021 1/15/2021 1/20/2021
2 1/22/2021 1/22/2021 1/23/2021
这就是我在这里的地方.NA并不是问题,因为我打算在完成所需的操作后删除DateStart和DateEnd列.这里的问题是我不想包括先前的DateStart和DateEnd范围内的日期.为了到达这里,我按ID分组,并在 df1
中的日期之间填写了缺少的日期:
This is where I am with this. The NAs aren't an issue because I intend to drop the DateStart and DateEnd columns after doing what I need to do. The issue here is that I don't want to include the dates that fall within the previous DateStart and DateEnd range.
To get here I grouped by ID and filled in the missing dates between the dates in df1
:
df2
ID Date DateStart DateEnd
1 2/11/2021 2/11/2021 2/17/2021
1 2/12/2021 NA NA
1 2/13/2021 NA NA
1 2/14/2021 NA NA
1 2/15/2021 NA NA
1 2/16/2021 NA NA
1 2/17/2021 NA NA
1 2/18/2021 NA NA
1 2/19/2021 2/19/2021 2/21/2021
2 1/15/2021 1/15/2021 1/20/2021
2 1/16/2021 NA NA
2 1/17/2021 NA NA
2 1/18/2021 NA NA
2 1/19/2021 NA NA
2 1/20/2021 NA NA
2 1/21/2021 NA NA
2 1/22/2021 NA NA
2 1/23/2021 1/23/2021 1/24/2021
这实际上是我想要得到的:
This is actually what I'd like to end up with:
df3
ID Date DateStart DateEnd
1 2/11/2021 2/11/2021 2/17/2021
1 2/18/2021 NA NA
1 2/19/2021 2/19/2021 2/21/2021
2 1/15/2021 1/15/2021 1/20/2021
2 1/21/2021 NA NA
2 1/22/2021 NA NA
2 1/23/2021 1/23/2021 1/24/2021
在 df3
中,缺少的日期会被填写,但DateStart-DateEnd范围内的日期不会被填写.
In df3
the missing dates are filled in but not the dates within the DateStart-DateEnd range.
关于如何实现这一目标的任何想法?注意:我有一个包含大量观察值的数据集.
Any thoughts on how to achieve this? Note: I have a dataset with a large number of observations.
推荐答案
-
将日期列转换为日期类.
Convert date columns to date class.
对于每个
ID
,使用complete
创建日期序列,从最小的DateStart
到最大的DateEnd 代码>.
For each
ID
usecomplete
to create sequence of dates from minimum ofDateStart
to maximum ofDateEnd
.使用先前的非NA填充
NA
值,除非Date>DateEnd
.对于每组
ID
,DateStart
和DateEnd
均使用NA
值保留行,或每组第1行.For every group of
ID
,DateStart
andDateEnd
keep the rows withNA
values or row number 1 in each group.library(dplyr) library(tidyr) df %>% mutate(across(-ID, lubridate::mdy)) %>% group_by(ID) %>% complete(Date = seq(min(DateStart), max(DateEnd), by = '1 day')) %>% fill(DateStart, DateEnd) %>% ungroup %>% mutate(across(c(DateStart, DateEnd), ~replace(., Date > DateEnd, NA))) %>% group_by(ID, DateStart, DateEnd) %>% filter(is.na(DateStart) | row_number() == 1) # ID Date DateStart DateEnd # <int> <date> <date> <date> #1 1 2021-02-11 2021-02-11 2021-02-17 #2 1 2021-02-18 NA NA #3 1 2021-02-19 2021-02-19 2021-02-21 #4 2 2021-01-15 2021-01-15 2021-01-20 #5 2 2021-01-21 NA NA #6 2 2021-01-22 NA NA #7 2 2021-01-23 2021-01-23 2021-01-24
数据
df <- structure(list(ID = c(1L, 1L, 2L, 2L), Date = c("2/11/2021", "2/19/2021", "1/15/2021", "1/23/2021"), DateStart = c("2/11/2021", "2/19/2021", "1/15/2021", "1/23/2021"), DateEnd = c("2/17/2021", "2/21/2021", "1/20/2021", "1/24/2021")), class = "data.frame", row.names = c(NA, -4L))
这篇关于如何根据2列中的值给出的日期范围删除行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!