从重叠日期计算活动日/月 [英] Calculating active days/months from overlapping dates
问题描述
我有大量客户的不同产品的开始和结束日期的数据。不同产品的间隔可能与购买之间有重叠或有时间差距:
库(lubridate)
库(Hmisc)
库(dplyr)
user_id< - c(rep(12,8),rep(33,5))
start_date< - dmy (31/10/2010,18/12/2010,31/10/2011,18/12/2011,27/03/2014,18/12/2014,27/03/2015,18/12/2016, 01/07/1992,20/08/1993,28/10/1999,31/01/2006,26/08/2016))
end_date< - dmy(Cs(31/10 / 2011,18/12/2011,28/04/2014,18/12/2014,27/03/2015,18/12/2016,27/03/2016,18/12/2017,
01 / 07/2016,16/08/2016,15/11/2012,28/02/2006,26/01/2017))
data< - data.frame(user_id,start_date, end_date)
data
user_id start_date end_date
1 12 2010-10-31 2011-10-31
2 12 2010-12-18 2011-12-18
3 12 2011-10-31 2014-04-28
4 12 2011-12-18 2014-12-18
5 12 2014-03-27 2015-03-27
6 12 2014-12-18 2016-12-18
7 12 2015-03- 27 2016-03-27
8 12 2016-12-18 2017-12-18
9 33 1992-07-01 2016-07-01
10 33 1993-08-20 2016 -08-16
11 33 1999-10-28 2012-11-15
12 33 2006-01-31 2006-02-28
13 33 2016-08-26 2017-01 -26
我想计算活动天数或月份的总数他/她持有任何产品。
如果产品总是重叠就不会有问题,那么我可以简单地采取
data%>%
group_by(user_id)%>%
dplyr :: summarize(time_diff = max(end_date ) - min(start_date))
但是,如您在用户33中可以看到的,产品不总是重叠,它们的间隔必须分别添加到所有重叠间隔。
有一种快速优雅的方式来编码,希望在 dplyr
?
我们可以使用 dplyr
中的函数计算总天数。以下示例展开每个时间段,然后删除重复的日期。最后计算每个 user_id
的总行号。
data2 < - data%>%
pre>
rowwise()%>%
do(data_frame(user_id =。$ user_id,
Date = seq(。$ start_date,$ end_date,by = 1 )))%>%
distinct()%>%
ungroup()%>%
count(user_id)
I have data listing start and end dates for different products for a big number of customers. The intervals for different products can overlap or have time gaps between purchases:
library(lubridate) library(Hmisc) library(dplyr) user_id <- c(rep(12, 8), rep(33, 5)) start_date <- dmy(Cs(31/10/2010, 18/12/2010, 31/10/2011, 18/12/2011, 27/03/2014, 18/12/2014, 27/03/2015, 18/12/2016, 01/07/1992, 20/08/1993, 28/10/1999, 31/01/2006, 26/08/2016)) end_date <- dmy(Cs(31/10/2011, 18/12/2011, 28/04/2014, 18/12/2014, 27/03/2015, 18/12/2016, 27/03/2016, 18/12/2017, 01/07/2016, 16/08/2016, 15/11/2012, 28/02/2006, 26/01/2017)) data <- data.frame(user_id, start_date, end_date) data user_id start_date end_date 1 12 2010-10-31 2011-10-31 2 12 2010-12-18 2011-12-18 3 12 2011-10-31 2014-04-28 4 12 2011-12-18 2014-12-18 5 12 2014-03-27 2015-03-27 6 12 2014-12-18 2016-12-18 7 12 2015-03-27 2016-03-27 8 12 2016-12-18 2017-12-18 9 33 1992-07-01 2016-07-01 10 33 1993-08-20 2016-08-16 11 33 1999-10-28 2012-11-15 12 33 2006-01-31 2006-02-28 13 33 2016-08-26 2017-01-26
I'd like to calculate the total number of active days or months during which he/she held any the products.
It wouldn't be a problem if the products ALWAYS overlapped as then I could simply take
data %>% group_by(user_id) %>% dplyr::summarize(time_diff = max(end_date) - min(start_date))
However, as you can see in user 33, products don't always overlap and their interval has to be added separately to all 'overlapped' intervals.
Is there a quick and elegant way to code it, hopefully in
dplyr
?解决方案We can use functions from
dplyr
to count the total number of days. The following example expands each time period, and then removes duplicated dates. Finally count the total row number for eachuser_id
.data2 <- data %>% rowwise() %>% do(data_frame(user_id = .$user_id, Date = seq(.$start_date, .$end_date, by = 1))) %>% distinct() %>% ungroup() %>% count(user_id)
这篇关于从重叠日期计算活动日/月的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!