从重叠日期计算活动日/月 [英] Calculating active days/months from overlapping dates

查看:362
本文介绍了从重叠日期计算活动日/月的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有大量客户的不同产品的开始和结束日期的数据。不同产品的间隔可能与购买之间有重叠或有时间差距:

 库(lubridate)
库(Hmisc)
库(dplyr)

user_id< - c(rep(12,8),rep(33,5))

start_date< - dmy (31/10/2010,18/12/2010,31/10/2011,18/12/2011,27/03/2014,18/12/2014,27/03/2015,18/12/2016, 01/07/1992,20/08/1993,28/10/1999,31/01/2006,26/08/2016))

end_date< - dmy(Cs(31/10 / 2011,18/12/2011,28/04/2014,18/12/2014,27/03/2015,18/12/2016,27/03/2016,18/12/2017,
01 / 07/2016,16/08/2016,15/11/2012,28/02/2006,26/01/2017))

data< - data.frame(user_id,start_date, end_date)

data
user_id start_date end_date
1 12 2010-10-31 2011-10-31
2 12 2010-12-18 2011-12-18
3 12 2011-10-31 2014-04-28
4 12 2011-12-18 2014-12-18
5 12 2014-03-27 2015-03-27
6 12 2014-12-18 2016-12-18
7 12 2015-03- 27 2016-03-27
8 12 2016-12-18 2017-12-18
9 33 1992-07-01 2016-07-01
10 33 1993-08-20 2016 -08-16
11 33 1999-10-28 2012-11-15
12 33 2006-01-31 2006-02-28
13 33 2016-08-26 2017-01 -26

我想计算活动天数或月份的总数他/她持有任何产品



如果产品总是重叠就不会有问题,那么我可以简单地采取

  data%>%
group_by(user_id)%>%
dplyr :: summarize(time_diff = max(end_date ) - min(start_date))

但是,如您在用户33中可以看到的,产品不总是重叠,它们的间隔必须分别添加到所有重叠间隔。



有一种快速优雅的方式来编码,希望在 dplyr

解决方案

我们可以使用 dplyr 中的函数计算总天数。以下示例展开每个时间段,然后删除重复的日期。最后计算每个 user_id 的总行号。

  data2 < -  data%>%
rowwise()%>%
do(data_frame(user_id =。$ user_id,
Date = seq(。$ start_date,$ end_date,by = 1 )))%>%
distinct()%>%
ungroup()%>%
count(user_id)
pre>

I have data listing start and end dates for different products for a big number of customers. The intervals for different products can overlap or have time gaps between purchases:

library(lubridate)
library(Hmisc)
library(dplyr)

user_id <- c(rep(12, 8), rep(33, 5))

start_date <- dmy(Cs(31/10/2010,    18/12/2010, 31/10/2011, 18/12/2011, 27/03/2014, 18/12/2014, 27/03/2015, 18/12/2016, 01/07/1992, 20/08/1993, 28/10/1999, 31/01/2006, 26/08/2016))

end_date <- dmy(Cs(31/10/2011,  18/12/2011, 28/04/2014, 18/12/2014, 27/03/2015, 18/12/2016, 27/03/2016, 18/12/2017,
               01/07/2016,  16/08/2016, 15/11/2012, 28/02/2006, 26/01/2017))

data <- data.frame(user_id, start_date, end_date)

data
   user_id start_date   end_date
1       12 2010-10-31 2011-10-31
2       12 2010-12-18 2011-12-18
3       12 2011-10-31 2014-04-28
4       12 2011-12-18 2014-12-18
5       12 2014-03-27 2015-03-27
6       12 2014-12-18 2016-12-18
7       12 2015-03-27 2016-03-27
8       12 2016-12-18 2017-12-18
9       33 1992-07-01 2016-07-01
10      33 1993-08-20 2016-08-16
11      33 1999-10-28 2012-11-15
12      33 2006-01-31 2006-02-28
13      33 2016-08-26 2017-01-26

I'd like to calculate the total number of active days or months during which he/she held any the products.

It wouldn't be a problem if the products ALWAYS overlapped as then I could simply take

data %>% 
group_by(user_id) %>% 
dplyr::summarize(time_diff = max(end_date) - min(start_date))

However, as you can see in user 33, products don't always overlap and their interval has to be added separately to all 'overlapped' intervals.

Is there a quick and elegant way to code it, hopefully in dplyr?

解决方案

We can use functions from dplyr to count the total number of days. The following example expands each time period, and then removes duplicated dates. Finally count the total row number for each user_id.

data2 <- data %>%
  rowwise() %>%
  do(data_frame(user_id = .$user_id, 
     Date = seq(.$start_date, .$end_date, by = 1))) %>%
  distinct() %>%
  ungroup() %>%
  count(user_id)

这篇关于从重叠日期计算活动日/月的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆