扩展缺少行的长格式时间序列数据 [英] Expanding long format time series data with missing Rows

查看:16
本文介绍了扩展缺少行的长格式时间序列数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有一个数据框:

df <- data.frame(group = c('A','A','A','B','B','B','C','C','C'), 
time = c(1,2,4,1,2,3,5,7,8), 
data = c(5,6,7,8,9,10,1,2,3))

我想要做的是将数据插入序列中缺失的数据框中.所以在上面的例子中,我丢失了时间 = 3 组 A 的数据,组 B 时间 = 4 和组 C 时间 = 6 的数据.我基本上想将 NA 放在数据列的位置.我将如何添加这些额外的行?我需要一个通用的解决方案注意:我编辑了这个问题,因为之前有一个错误我们不能假设每个组只有 4 个观察.

What I want to do is insert data into the data frame where it was missing in the sequence. So in the above example, I'm missing data for time = 3 for group A, and time = 4 for Group B and time =6 for Group C. I would essentially want to put NAs in the place of the data column. How would I go about adding these additional rows? I need a generalized solution NOTE: I EDITED THE QUESTION AS THERE WAS AN ERROR EARLIER WE CANNOT ASSUME THAT THERE WILL BE ONLY 4 OBSERVATIONS FOR EACH GROUP.

目标是:

  df <- data.frame(group = c('A','A','A','A','B','B','B','C','C','C','C'), 
    time = c(1,2,3,4,1,2,3,5,6,7,8), 
    data = c(5,6,NA,7,8,9,10,1,NA,2,3))

推荐答案

这是使用 data.table 的一个选项.将'data.frame'转换为'data.table'(setDT(df)),将按'group'分组的数据集从min扩展为max of 'time' 并加入 on 'group' 和 'time' 列.

Here is one option using data.table. Convert the 'data.frame' to 'data.table' (setDT(df)), expand the dataset grouped by 'group' from min to max of 'time' and join on the 'group' and 'time' columns.

library(data.table)
setDT(df)[df[, .(time = min(time):max(time)) , by = group], on = c("group", "time")]
#    group time data
# 1:     A    1    5
# 2:     A    2    6
# 3:     A    3   NA
# 4:     A    4    7
# 5:     B    1    8
# 6:     B    2    9
# 7:     B    3   10
# 8:     C    5    1
# 9:     C    6   NA
#10:     C    7    2
#11:     C    8    3

这篇关于扩展缺少行的长格式时间序列数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆