有效地添加时间序列中的缺失值 [英] Add missing values in time series efficiently

查看：177 发布时间：2020/5/9 23:18:11 r time-series missing-data

本文介绍了有效地添加时间序列中的缺失值的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有500个数据集(面板数据).在每种情况下，我在不同的商店(商店)中都有一个时间序列(星期).在每个商店中，我需要添加缺少的时间序列观测值.

I have 500 datasets (panel data). In each I have a time series (week) across different shops (store). Within each shop, I would need to add missing time series observations.

我的数据示例为:

store   week           value
1           1          50
1           3          52
1           4          10
2           1          4
2           4          84
2           5          2

我想要的样子:

store   week        value
1           1       50
1           2       0
1           3       52
1           4       10
2           1       4
2           2       0
2           3       0
2           4       84
2           5       2

我目前使用以下代码(可以正常工作，但是对我的数据的处理却非常长):

I currently use the following code (which works, but takes very very long on my data):

  stores<-unique(mydata$store)

  for (i in 1:length(stores)){ 
  mydata <- merge(
    expand.grid(week=min(mydata$week):max(mydata$week)),
    mydata, all=TRUE)
  mydata[is.na(mydata)] <- 0
  }

有更好，更有效的方法吗?

Are there better and more efficient ways to do so?

推荐答案

以下是您可以尝试的dplyr/tidyr选项:

Here's a dplyr/tidyr option you could try:

library(dplyr); library(tidyr)
group_by(df, store) %>% 
  complete(week = full_seq(week, 1L), fill = list(value = 0)) 
#Source: local data frame [9 x 3]
#
#  store  week value
#  (int) (int) (dbl)
#1     1     1    50
#2     1     2     0
#3     1     3    52
#4     1     4    10
#5     2     1     4
#6     2     2     0
#7     2     3     0
#8     2     4    84
#9     2     5     2

默认情况下，如果不指定fill参数，则新行将用NA填充.由于您似乎还有许多其他列，因此建议您不要使用fill参数，这样您就可以得到NA，如果需要，请使用mutate_each进行下一步以将NA设置为0(如果合适).

By default, if you don't specify the fill parameter, new rows will be filled with NA. Since you seem to have many other columns, I would advise to leave out the fill parameter so you end up with NAs, and if required, make another step with mutate_each to turn NAs into 0 (if that's appropriate).

group_by(df, store) %>% 
  complete(week = full_seq(week, 1L)) %>%
  mutate_each(funs(replace(., which(is.na(.)), 0)), -store, -week)

这篇关于有效地添加时间序列中的缺失值的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

有效地添加时间序列中的缺失值 [英] Add missing values in time series efficiently

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

有效地添加时间序列中的缺失值 [英] Add missing values in time series efficiently

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭