R/lubridate:计算两个期间之间的重叠天数 [英] R / lubridate: Calculate number of overlapping days between two periods
问题描述
我正在尝试计算两个时间段之间的重叠天数.一个期间固定在开始日期和结束日期,另一个期间记录在数据框中作为开始日期和结束日期.
I am trying to calculate the number of overlapping days between two time periods. One period is fixed in a start and end date, the other is recorded as start and end dates in a data frame.
我正在处理具有发布日期(df $ start)和未发布日期(df $ end)的广告.我要找出的是他们在特定月份的上网天数(my.start = 2018-01-01,my.end = 2018-08-31).
I'm dealing with ads that have publish date (df$start) and an unpublish date (df$end). What I'm trying to find out is how many days they have been online in a specific month (my.start = 2018-01-01, my.end = 2018-08-31).
library(dplyr)
library(lubridate)
my.start <- ymd("2018-08-01")
my.end <- ymd("2018-08-31")
df <- data.frame(start = c("2018-07-15", "2018-07-20", "2018-08-15", "2018-08-20", "2018-09-01"),
end = c("2018-07-20", "2018-08-05", "2018-08-19", "2018-09-15", "2018-09-15"))
# strings to dates
df <- mutate(df, start = ymd(start), end = ymd(end))
# does not work - calculate overlap in days
df <- mutate(df, overlap = intersect(interval(my.start, my.end), interval(start, end)))
结果应为0、5、4、12、0天:
Results should be 0, 5, 4, 12, 0 days:
my.start |-------------------------------| my.end
|-----| (0)
|---------| (5)
|----| (4)
|------------------| (12)
|---------------| (0)
在Excel中,我将使用
In Excel, I would use
=MAX(MIN(my.end, end) - MAX(my.start, start) + 1, 0)
但这也不起作用:
# does not work - calculate via min/max
df <- mutate(df, overlap = max(min(my.end, end) - max(my.start, start) + 1, 0))
在尝试使用日期上使用as.numeric()
的Excel方法之前,我想知道是否有更聪明的方法来做到这一点.
Before I try to use the Excel approach using as.numeric()
on the dates, I wondered if there is a cleverer way to do this.
实际上,Excel数值方法似乎也不起作用(所有结果均为零):
Actually, the Excel numeric approach doesn't seem two work either (all results are zero):
# does not work - calculate via numeric
ms.num <- as.numeric(my.start)
me.num <- as.numeric(my.end)
df <- df %>%
mutate(s.num = as.numeric(start),
e.num = as.numeric(end),
overlap = max(min(e.num, me.num) - max(s.num, ms.num) + 1, 0))
@akrun的方法似乎适用于ymd日期.但是,它似乎在ymd_hms时间不起作用:
The approach by @akrun seems to work for ymd dates. However, it doesn't seem to work for ymd_hms times:
library(dplyr)
library(lubridate)
library(purrr)
my.start <- ymd("2018-08-01")
my.end <- ymd("2018-08-31")
df <- data.frame(start = c("2018-07-15 10:00:00", "2018-07-20 10:00:00", "2018-08-15 10:00:00", "2018-08-20 10:00:00", "2018-09-01 10:00:00"),
end = c("2018-07-20 10:00:00", "2018-08-05 10:00:00", "2018-08-19 10:00:00", "2018-09-15 10:00:00", "2018-09-15 10:00:00"))
# strings to dates
df <- mutate(df, start = ymd_hms(start), end = ymd_hms(end))
# leads to 0 results
df %>% mutate(overlap = map2(start, end, ~ sum(seq(.x, .y, by = '1 day') %in% seq(my.start, my.end, by = '1 day'))))
推荐答案
我认为您可能会遇到max
和min
与pmax
和pmin
的问题:
I think you may be running into issues with max
and min
vs pmax
and pmin
:
library(dplyr)
df %>%
mutate(overlap = pmax(pmin(my.end, end) - pmax(my.start, start) + 1,0))
start end overlap
1 2018-07-15 2018-07-20 0 days
2 2018-07-20 2018-08-05 5 days
3 2018-08-15 2018-08-19 5 days
4 2018-08-20 2018-09-15 12 days
5 2018-09-01 2018-09-15 0 days
这篇关于R/lubridate:计算两个期间之间的重叠天数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!