“嵌套" R中的一个数据框 [英] "Unnesting" a dataframe in R
问题描述
我有以下data.frame
:
df <- data.frame(id=c(1,2,3),
first.date=as.Date(c("2014-01-01", "2014-03-01", "2014-06-01")),
second.date=as.Date(c("2015-01-01", "2015-03-01", "2015-06-1")),
third.date=as.Date(c("2016-01-01", "2017-03-01", "2018-06-1")),
fourth.date=as.Date(c("2017-01-01", "2018-03-01", "2019-06-1")))
> df
id first.date second.date third.date fourth.date
1 1 2014-01-01 2015-01-01 2016-01-01 2017-01-01
2 2 2014-03-01 2015-03-01 2017-03-01 2018-03-01
3 3 2014-06-01 2015-06-01 2018-06-01 2019-06-01
每一行代表三个时间跨度;即时间跨度分别在first.date
和second.date
,second.date
和third.date
和third.date
和fourth.date
之间.
Each row represents three timespans; i.e. the time spans between first.date
and second.date
, second.date
and third.date
, and third.date
and fourth.date
respectively.
我想在缺少一个更好的词的情况下,使数据框嵌套以获取此信息:
I would like to, in lack of a better word, unnest the dataframe to obtain this instead:
id StartDate EndDate
1 1 2014-01-01 2015-01-01
2 1 2015-01-01 2016-01-01
3 1 2016-01-01 2017-01-01
4 2 2014-03-01 2015-03-01
5 2 2015-03-01 2017-03-01
6 2 2017-03-01 2018-03-01
7 3 2014-06-01 2015-06-01
8 3 2015-06-01 2018-06-01
9 3 2018-06-01 2019-06-01
我一直在使用tidyr
包中的unnest
函数,但是得出的结论是我认为这不是我真正想要的.
I have been playing around with the unnest
function from the tidyr
package, but I came to the conclusion that I don't think it's what I'm really looking for.
有什么建议吗?
推荐答案
您可以按以下方式尝试tidyr/dplyr:
You can try tidyr/dplyr as follows:
library(tidyr)
library(dplyr)
df %>% gather(DateType, StartDate, -id) %>% select(-DateType) %>% arrange(id) %>% group_by(id) %>% mutate(EndDate = lead(StartDate))
您可以通过添加以下内容来消除每个id组中的最后一行:
You can eliminate the last row in each id group by adding:
%>% slice(-4)
到上述管道.
这篇关于“嵌套" R中的一个数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!