R:填充时间序列值,但仅在过去 12 个月内 [英] R: Filling timeseries values but only within last 12 months

查看:27
本文介绍了R:填充时间序列值,但仅在过去 12 个月内的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们如何填充/向前填充 R 中的时间序列,但前提是最后一个值在过去 12 个月内/观察值,否则不适用?

How can we fill/forward pad a time series in R but only if the last value is within the last 12 months/observations otherwise NA?

样本数据:变量是原始数据,期望是期望的结果.我们从 2016 年 6 月开始观察 NA,但我们将转发 pad.我只想这样做 12 个月,所以一旦我们到达 2017 年 7 月,最后一个非 NA 时间太长了,应该是 NA.这就是为什么像 fill() 这样的东西不会做

Sample data: Variable is the original and desired is the desired outcome. We observe NAs from June 2016 but we will forward pad. I only want to do this for 12 months so as soon as we reach July 2017 the last non-NA is too long a go and should be NA. That's why something like fill() alone will not do

示例

最小工作示例:考虑以下使用最大间隙和 na.locf

minimum working example: Consider the below using max gap and na.locf

x = c(NA,1,2,3,NA,NA,5,6,7,NA,NA,NA)
zoo::na.locf(x,  maxgap = 2, na.rm = FALSE)

当 NAs> 最大间隙时不填充任何东西,我想要以下输出:不适用,1、2、3、3、3、5、6、7、7、7,不适用.因此,如果我指定 gap =2,我希望最多填充两个值,并且更多的 NA 应该保持 NA

Instead of not filling anything when number of NAs> max gap I would like the below output: NA,1,2,3,3,3,5,6,7,7,7,NA. So if I specify gap =2 I would want at most two values filled and any more NAs should stay NA

推荐答案

一个选项是使用 tidyr::fill.方法是将列创建为 desiredTempDate,这样 desired 将与 variable 具有相同的值但是""(空白)值为variable 的行将desired 作为NA.类似地,TempDate 将具有与 date 相同的值,但对于 variable 得到 的行,它会有 NA"" 值.

An option is to use tidyr::fill. The approach is to create columns as desired and TempDate in such a way that desired will have same value as variable but rows with "" (blank) value for variable will have desired as NA. Similarly TempDate will have same value as date but it will have NA for rows where variable got "" values.

fill desiredTempDate 并将 desired 替换为 NA 其中 TempDatedate 早 12 个月以上.

fill both desired and TempDate and replace desired to NA where TempDate is older by more than 12 months than date.

library(tidyverse)
library(lubridate)

df %>% mutate(TempDate = as.Date(ifelse(variable=="", NA, date),origin = "1970-01-01"),
              desired = ifelse(variable=="",NA, variable)) %>%
  fill(desired, TempDate) %>%
  mutate(desired = ifelse(date > (TempDate +months(12)), NA, desired)) %>%
  select(-TempDate)

#          date variable desired
# 1  2016-01-01        1       1
# 2  2016-02-01        2       2
# 3  2016-03-01        3       3
# 4  2016-04-01        3       3
# 5  2016-05-01        3       3
# 6  2016-06-01       33      33
# 7  2016-07-01               33
# 8  2016-08-01               33
# 9  2016-09-01               33
# 10 2016-10-01               33
# 11 2016-11-01               33
# 12 2016-12-01               33
# 13 2017-01-01               33
# 14 2017-02-01               33
# 15 2017-03-01               33
# 16 2017-04-01               33
# 17 2017-05-01               33
# 18 2017-06-01               33
# 19 2017-07-01             <NA>
# 20 2017-08-01             <NA>
# 21 2017-09-01       34      34
# 22 2017-10-01               34

数据:基于OP分享的图片

df <- data.frame(date = seq(as.Date("2016-01-01"), as.Date("2017-10-01"), by="month"),
           variable = c(1,2,3,3,3,33,rep("",14),34,""), stringsAsFactors = FALSE)

df
#          date variable
# 1  2016-01-01        1
# 2  2016-02-01        2
# 3  2016-03-01        3
# 4  2016-04-01        3
# 5  2016-05-01        3
# 6  2016-06-01       33
# 7  2016-07-01         
# 8  2016-08-01         
# 9  2016-09-01         
# 10 2016-10-01         
# 11 2016-11-01         
# 12 2016-12-01         
# 13 2017-01-01         
# 14 2017-02-01         
# 15 2017-03-01         
# 16 2017-04-01         
# 17 2017-05-01         
# 18 2017-06-01         
# 19 2017-07-01         
# 20 2017-08-01         
# 21 2017-09-01       34
# 22 2017-10-01         

这篇关于R:填充时间序列值,但仅在过去 12 个月内的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆