将缺少的时间行插入数据框 [英] Insert missing time rows into a dataframe

查看：61 发布时间：2020/5/9 23:12:36 r time-series missing-data

本文介绍了将缺少的时间行插入数据框的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

假设我有一个数据框:

df <- data.frame(group = c('A','A','A','B','B','B'), 
                 time = c(1,2,4,1,2,3),
                 data = c(5,6,7,8,9,10))

我想要做的是将数据插入序列中丢失的数据框.因此，在上面的示例中，对于A组，我丢失了time = 3的数据，对于B组，我缺少了time = 4的数据.我本质上想将0替换为data列.

我将如何添加这些额外的行?

目标是:

df <- data.frame(group = c('A','A','A','A','B','B','B','B'), 
                 time = c(1,2,3,4,1,2,3,4),
                 data = c(5,6,0,7,8,9,10,0))

我的真实数据是几千个数据点，因此无法手动进行.

解决方案

您可以尝试merge/expand.grid

 res <- merge(
          expand.grid(group=unique(df$group), time=unique(df$time)),
                                     df, all=TRUE)
 res$data[is.na(res$data)] <- 0
 res
 #  group time data
 #1     A    1    5
 #2     A    2    6
 #3     A    3    0
 #4     A    4    7
 #5     B    1    8
 #6     B    2    9
 #7     B    3   10
 #8     B    4    0

或使用data.table

 library(data.table)
 setkey(setDT(df), group, time)[CJ(group=unique(group), time=unique(time))
                     ][is.na(data), data:=0L]
 #    group time data
 #1:     A    1    5
 #2:     A    2    6
 #3:     A    3    0
 #4:     A    4    7
 #5:     B    1    8
 #6:     B    2    9
 #7:     B    3   10
 #8:     B    4    0

更新

如评论中提到的@thelatemail，如果所有组中都不存在特定的时间"值，则上述方法将失败.也许这会更笼统.

 res <- merge(
          expand.grid(group=unique(df$group), 
                      time=min(df$time):max(df$time)),
                                     df, all=TRUE)
 res$data[is.na(res$data)] <- 0

，并类似地在data.table解决方案中将time=unique(time)替换为time= min(time):max(time).

Let's say I have a dataframe:

df <- data.frame(group = c('A','A','A','B','B','B'), 
                 time = c(1,2,4,1,2,3),
                 data = c(5,6,7,8,9,10))

What I want to do is insert data into the data frame where it was missing in the sequence. So in the above example, I'm missing data for time = 3 for group A, and time = 4 for Group B. I would essentially want to put 0's in the place of the data column.

How would I go about adding these additional rows?

The goal would be:

df <- data.frame(group = c('A','A','A','A','B','B','B','B'), 
                 time = c(1,2,3,4,1,2,3,4),
                 data = c(5,6,0,7,8,9,10,0))

My real data is a couple thousand data points, so manually doing so isn't possible.

解决方案

You can try merge/expand.grid

 res <- merge(
          expand.grid(group=unique(df$group), time=unique(df$time)),
                                     df, all=TRUE)
 res$data[is.na(res$data)] <- 0
 res
 #  group time data
 #1     A    1    5
 #2     A    2    6
 #3     A    3    0
 #4     A    4    7
 #5     B    1    8
 #6     B    2    9
 #7     B    3   10
 #8     B    4    0

Or using data.table

 library(data.table)
 setkey(setDT(df), group, time)[CJ(group=unique(group), time=unique(time))
                     ][is.na(data), data:=0L]
 #    group time data
 #1:     A    1    5
 #2:     A    2    6
 #3:     A    3    0
 #4:     A    4    7
 #5:     B    1    8
 #6:     B    2    9
 #7:     B    3   10
 #8:     B    4    0

Update

As @thelatemail mentioned in the comments, the above method would fail if a particular 'time' value is not present in all the groups. May be this would be more general.

 res <- merge(
          expand.grid(group=unique(df$group), 
                      time=min(df$time):max(df$time)),
                                     df, all=TRUE)
 res$data[is.na(res$data)] <- 0

and similarly replace time=unique(time) with time= min(time):max(time) in the data.table solution.

这篇关于将缺少的时间行插入数据框的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

将缺少的时间行插入数据框 [英] Insert missing time rows into a dataframe

问题描述

更新

Update

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

将缺少的时间行插入数据框 [英] Insert missing time rows into a dataframe

问题描述

更新

Update

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭