有效地添加时间序列中的缺失值 [英] Add missing values in time series efficiently
问题描述
我有500个数据集(面板数据).在每种情况下,我在不同的商店(商店)中都有一个时间序列(星期).在每个商店中,我需要添加缺少的时间序列观测值.
I have 500 datasets (panel data). In each I have a time series (week) across different shops (store). Within each shop, I would need to add missing time series observations.
我的数据示例为:
store week value
1 1 50
1 3 52
1 4 10
2 1 4
2 4 84
2 5 2
我想要的样子:
store week value
1 1 50
1 2 0
1 3 52
1 4 10
2 1 4
2 2 0
2 3 0
2 4 84
2 5 2
我目前使用以下代码(可以正常工作,但是对我的数据的处理却非常长):
I currently use the following code (which works, but takes very very long on my data):
stores<-unique(mydata$store)
for (i in 1:length(stores)){
mydata <- merge(
expand.grid(week=min(mydata$week):max(mydata$week)),
mydata, all=TRUE)
mydata[is.na(mydata)] <- 0
}
有更好,更有效的方法吗?
Are there better and more efficient ways to do so?
推荐答案
以下是您可以尝试的dplyr/tidyr选项:
Here's a dplyr/tidyr option you could try:
library(dplyr); library(tidyr)
group_by(df, store) %>%
complete(week = full_seq(week, 1L), fill = list(value = 0))
#Source: local data frame [9 x 3]
#
# store week value
# (int) (int) (dbl)
#1 1 1 50
#2 1 2 0
#3 1 3 52
#4 1 4 10
#5 2 1 4
#6 2 2 0
#7 2 3 0
#8 2 4 84
#9 2 5 2
默认情况下,如果不指定fill
参数,则新行将用NA
填充.由于您似乎还有许多其他列,因此建议您不要使用fill参数,这样您就可以得到NA,如果需要,请使用mutate_each
进行下一步以将NA设置为0(如果合适).
By default, if you don't specify the fill
parameter, new rows will be filled with NA
. Since you seem to have many other columns, I would advise to leave out the fill parameter so you end up with NAs, and if required, make another step with mutate_each
to turn NAs into 0 (if that's appropriate).
group_by(df, store) %>%
complete(week = full_seq(week, 1L)) %>%
mutate_each(funs(replace(., which(is.na(.)), 0)), -store, -week)
这篇关于有效地添加时间序列中的缺失值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!