使用开始和结束日期按日期范围扩展行 [英] Expand rows by date range using start and end date
问题描述
考虑表单的数据框
idnum start end
1993.1 17 1993-01-01 1993-12-31
1993.2 17 1993-01-01 1993-12-31
1993.3 17 1993-01-01 1993-12-31
start
和 end
是 Date
$ idnum : int 17 17 17 17 27 27
$ start : Date, format: "1993-01-01" "1993-01-01" "1993-01-01" "1993-01-01" ...
$ end : Date, format: "1993-12-31" "1993-12-31" "1993-12-31" "1993-12-31" ...
我想创建一个 新 数据框,而不是每月观察每一行,每个月在 start
和 end
之间(包括边界):
I would like to create a new dataframe, that has instead monthly observations for every row, for every month in between start
and end
(including the boundaries):
期望输出
idnum month
17 1993-01-01
17 1993-02-01
17 1993-03-01
...
17 1993-11-01
17 1993-12-01
我不确定 month
应该有什么格式,我有时想按 idnum
、month
分组以进行回归数据集的其余部分.
I'm not sure what format month
should have, I will at some point want to group by idnum
, month
for regressions on the rest of the data set.
到目前为止,对于每一行,seq(from=test[1,'start'], to=test[1, 'end'], by='1 month')
给出我是正确的序列 - 但是一旦我尝试将其应用于整个数据框,它将无法正常工作:
So far, for every single row, seq(from=test[1,'start'], to=test[1, 'end'], by='1 month')
gives me the right sequence - but as soon as I try to apply that to the whole data frame, it will not work:
> foo <- apply(test, 1, function(x) seq(x['start'], to=x['end'], by='1 month'))
Error in to - from : non-numeric argument to binary operator
推荐答案
Using data.table
:
require(data.table) ## 1.9.2+
setDT(df)[ , list(idnum = idnum, month = seq(start, end, by = "month")), by = 1:nrow(df)]
# you may use dot notation as a shorthand alias of list in j:
setDT(df)[ , .(idnum = idnum, month = seq(start, end, by = "month")), by = 1:nrow(df)]
setDT
将 df
转换为 data.table
.然后对于每一行,by = 1:nrow(df)
,我们根据需要创建idnum
和month
.
setDT
converts df
to a data.table
. Then for each row, by = 1:nrow(df)
, we create idnum
and month
as required.
这篇关于使用开始和结束日期按日期范围扩展行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!