在R中按日期间隔汇总结果 [英] Aggregate results by date intervals in R
问题描述
我正在使用R,并且我的数据存储在data.tables对象中.我的数据的格式为ID,Date1,Date2,Row
I'm using R and I have my data on data.tables objects. My data is of the format ID, Date1, Date2, Row
对于每个ID,我可以有多个条目,并且两个日期定义一个时间间隔.
For each ID I can have more than one entry, and the two dates define a time interval.
我希望能够按ID和重叠的时间间隔汇总所有条目.我确实知道如何使用for循环之类的方法,但是我想知道是否有更好的方法.
I want to be able to aggregate all the entries by id and overlapping time intervals. I do know how to do it with for loops and such, but I wonder if there is a better way.
示例:
data = data.table(
id = c(1,1,1,2,2,3,3),
Row = c(1,2,3,4,5,6,7),
Date1 = c("2018-01-01",
"2018-01-05",
"2018-01-21",
"2018-01-01",
"2018-01-15",
"2018-01-01",
"2018-01-19"),
Date2 = c("2018-01-10",
"2018-01-20",
"2018-01-22",
"2018-01-31",
"2018-01-19",
"2018-01-15",
"2018-01-23"))
所需的输出将是标识以下几组行的内容:((1,2),(3),(4,5),(6),(7)),这样我可以生成一个新的行ID,基于此分组.
The desired output would be something that identifies the following groups of rows: ((1,2),(3),(4,5),(6),(7)) , so that I can generate a new ID, based on this grouping.
推荐答案
引用输出:
id Row Date1 Date2 g
1: 1 1 2018-01-01 2018-01-10 0
2: 1 2 2018-01-05 2018-01-20 0
3: 1 3 2018-01-21 2018-01-22 1
4: 2 4 2018-01-01 2018-01-31 2
5: 2 5 2018-01-15 2018-01-19 2
6: 3 6 2018-01-01 2018-01-15 3
7: 3 7 2018-01-19 2018-01-23 4
数据:
library(data.table)
data = data.table(
id = c(1,1,1,2,2,3,3),
Row = c(1,2,3,4,5,6,7),
Date1 = c("2018-01-01","2018-01-05","2018-01-21","2018-01-01","2018-01-15","2018-01-01","2018-01-19"),
Date2 = c("2018-01-10","2018-01-20","2018-01-22","2018-01-31","2018-01-19","2018-01-15","2018-01-23"))
cols <- c("Date1", "Date2")
data[, (cols) := lapply(.SD, as.Date, format="%Y-%m-%d"), .SDcols=cols]
这篇关于在R中按日期间隔汇总结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!