使用前一天的数据完成时间序列中的缺失值-使用R [英] Complete missing values in time series using previous day data - using R

查看:57
本文介绍了使用前一天的数据完成时间序列中的缺失值-使用R的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框,其中每一行是不同的日期,每一列是不同的时间序列.
表格中的日期范围是01.01.2019-01.01.2021.
一些时间序列仅与部分日期相关,并且在周末和节假日缺少值.

I have a data frame where each row is a different date and every column is different time series.
The date range in the table is 01.01.2019-01.01.2021.
Some of the time series are relevant for only part of the dates and have missing values on weekends and holidays.

如何仅使用每一列的相关日期的前一天值来完成每个时间序列的缺失值(如果特定列中的时间序列是从01.03.2019到01.09.2019,我只想完成在此日期范围内缺少值)?

How can I complete the missing values for each time series using previous day values only for the relevant dates of each column (if the time series in a specific column is from 01.03.2019 to 01.09.2019 I want to complete only the missing values in this dates range)?

我尝试使用填充功能:

data <- data %>%  
fill(colnames(data)) 

但是在特定的时间序列结束后,它还会完成丢失的数据.

but it completes also the missing data after the specific time series is over.

例如,df是:

#  Date         time_series_1           time_series_2
1  01-01-2019               NA                      10
2  02-01-2019               5                       NA 
3  03-01-2019               10                      NA 
4  04-01-2019               20                      6 
5  05-01-2019               30                      NA 
6  06-01-2019               NA                      8 
7  07-01-2019               7                       NA 
8  08-01-2019               5                       NA 
9  09-01-2019               NA                      NA
10 10-01-2019               NA                      NA 

所需的输出是:

#  Date         time_series_1           time_series_2
1  01-01-2019               NA                      10
2  02-01-2019               5                       10 
3  03-01-2019               10                      10 
4  04-01-2019               20                      6 
5  05-01-2019               30                      6 
6  06-01-2019               30                      8 
7  07-01-2019               7                       NA 
8  08-01-2019               5                       NA 
9  09-01-2019               NA                      NA
10 10-01-2019               NA                      NA 

谢谢!

推荐答案

如果我正确理解,窍门是除了最底端的NA之外,您要向下填充.而 tidyr fill 的问题在于,它会一直下降.

If I understand correctly, the trick is that you want to fill downward except for the bottommost NAs. And the problem with tidyr's fill is that it goes all the way down.

这不是一个完整的解决方案,但是对于此数据:

This isn't a fully-tidyverse solution, but for this data:

library(dplyr)
library(tidyr)
data <- tribble(
  ~Date, ~time_series_1, ~time_series_2,
  as.Date("2019-01-01"), NA, 10,
  as.Date("2019-02-01"), 5, NA,
  as.Date("2019-03-01"), 10, NA,
  as.Date("2019-04-01"), 20, 6,
  as.Date("2019-05-01"), 30, NA,
  as.Date("2019-06-01"), NA, 8,
  as.Date("2019-07-01"), 7, NA,
  as.Date("2019-08-01"), 5, NA,
  as.Date("2019-09-01"), NA, NA,
  as.Date("2019-10-01"), NA, NA
)

您可以分别确定每个时间序列的结束日期:

You can determine the ending date for each time series separately:

LastTS1Date <- with( data, max(Date[!is.na(time_series_1)])) 
LastTS2Date <- with( data, max(Date[!is.na(time_series_2)]))

然后使用baseR过滤器语法仅更改截止日期的数据框部分:

And then use baseR filter syntax to only change the part of the data frame that goes up to those dates:

data[data$Date <= LastTS1Date,] <-
  data[data$Date <= LastTS1Date,] %>% fill(time_series_1)

data[data$Date <= LastTS2Date,] <-
  data[data$Date <= LastTS2Date,] %>% fill(time_series_2)

这篇关于使用前一天的数据完成时间序列中的缺失值-使用R的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆