R - 按组展开日期范围到面板数据 [英] R -- Expand date range into panel data by group

查看:149
本文介绍了R - 按组展开日期范围到面板数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有日期范围,分为两个变量( id 类型),它们当前存储在数据框称为数据。我的目标是扩大日期范围,使我在日期范围内的每一天都有一行,其中包括相同的 id type

I have date ranges that are grouped by two variables (id and type) that are currently stored in a data frame called data. My goal is to expand the date range such that I have a row for each day within the range of dates, which includes the same id and type.

这是一个代码片段,用于重现数据框架的示例:

Here is a snippet to reproduce an example of the data frame:

data <- structure(list(id = c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2), type = c("a", 
"a", "b", "c", "b", "a", "c", "d", "e", "f"), from = structure(c(1235199600, 
1235545200, 1235545200, 1235631600, 1235631600, 1242712800, 1242712800, 
1243058400, 1243058400, 1243231200), class = c("POSIXct", "POSIXt"
), tzone = ""), to = structure(c(1235372400, 1235545200, 1235631600, 
1235890800, 1236236400, 1242712800, 1243058400, 1243231200, 1243144800, 
1243576800), class = c("POSIXct", "POSIXt"), tzone = "")), .Names = c("id", 
"type", "from", "to"), row.names = c(700L, 753L, 2941L, 2178L, 
 2959L, 679L, 2185L, 12L, 802L, 1796L), class = "data.frame")

一个visua l表示数据集:

This is a visual representation of the data set:

id  type  from        to
1   a     2009-02-21  2009-02-23
1   a     2009-02-25  2009-02-25
1   b     2009-02-25  2009-02-26
1   c     2009-02-25  2009-03-01
1   b     2009-05-26  2009-03-05
2   a     2009-05-26  2009-05-19
2   c     2009-05-19  2009-05-23
2   d     2009-05-19  2009-05-25
2   e     2009-05-23  2009-05-24
2   f     2009-05-25  2009-05-29

以下是预期结果的视觉表示:

Here is a visual representation of the intended result:

id  type  date
1   a     2009-02-21
1   a     2009-02-22
1   a     2009-02-23
1   b     2009-02-25
1   b     2009-02-26
1   c     2009-02-26
1   c     2009-02-27
1   c     2009-02-28
1   c     2009-03-01
...
2   f     2009-05-25
2   f     2009-05-26
2   f     2009-05-27
2   f     2009-05-28
2   f     2009-05-29

我发现了几个类似的帖子(链接链接),这有助于我的起点。我试图使用plyr解决方案:

I've found several similar posts (link and link) that were helpful in giving me a starting point. I've attempted to use a plyr solution:

data2 <- adply(data, 1, summarise, date = seq(data$from, data$to))[c('id', 'type')]

但是,这会导致错误:

Error: 'from' must be of length 1

我还尝试使用一个data.table解决方案:

I have also attempted to use a data.table solution:

data[, list(date = seq(from, to)), by = c('id', 'type')]

但是,这给了我一个不同的错误:

However, this gives me a different error:

Error in `[.data.frame`(data, , list(date = seq(from, to)), by = c("id",  : 
unused argument (by = c("id", "type"))

关于如何解决这些错误(或使用不同的方法)的任何想法将是非常大的赞赏。

Any thoughts on how to go about resolving these errors (or using a different approach) would be greatly appreciated.

推荐答案

1)由使用来自R. Fir的基地的我们将日期转换为Date data2 的类。然后,我们应用 f ,它在每一行执行真正的工作,最后我们将<?c $ c> rbind 结果行放在一起:

1) by Here is a three line answer using by from the base of R. First we convert the dates to "Date" class giving data2. Then we apply f which does the real work over each row and finally we rbind the resulting rows together:

data2 <- transform(data, from = as.Date(from), to = as.Date(to))

f <- function(x) with(x, data.frame(id, type, date = seq(from, to, by = "day")))
do.call("rbind", by(data, 1:nrow(data), f))

2) data.table 使用与data.table相同的 data2 ,我们这样做:

2) data.table Using the same data2 with data.table we do it like this:

library(data.table)

dt <- data.table(data2)
dt[, list(id, type, date = seq(from, to, by = "day")), by = 1:nrow(dt)]

strong> 2a)data.table ,或者另外,这里 dt 来自(2)和 f 从(1):

2a) data.table or alternately this where dt is from (2) and f is from (1):

dt[, f(.SD), by = 1:nrow(dt)]

3)dplyr 使用dplyr它会发出警告,但是在 da ta2 f 来自(1):

3) dplyr with dplyr it gives a warning but otherwise works where data2 and f are from (1):

data2 %>% rowwise() %>% do(f(.))

更新有些改进。

这篇关于R - 按组展开日期范围到面板数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆