R:填充时间序列值,但仅在过去 12 个月内 [英] R: Filling timeseries values but only within last 12 months
问题描述
我们如何填充/向前填充 R 中的时间序列,但前提是最后一个值在过去 12 个月内/观察值,否则不适用?
How can we fill/forward pad a time series in R but only if the last value is within the last 12 months/observations otherwise NA?
样本数据:变量是原始数据,期望是期望的结果.我们从 2016 年 6 月开始观察 NA,但我们将转发 pad.我只想这样做 12 个月,所以一旦我们到达 2017 年 7 月,最后一个非 NA 时间太长了,应该是 NA.这就是为什么像 fill() 这样的东西不会做
Sample data: Variable is the original and desired is the desired outcome. We observe NAs from June 2016 but we will forward pad. I only want to do this for 12 months so as soon as we reach July 2017 the last non-NA is too long a go and should be NA. That's why something like fill() alone will not do
最小工作示例:考虑以下使用最大间隙和 na.locf
minimum working example: Consider the below using max gap and na.locf
x = c(NA,1,2,3,NA,NA,5,6,7,NA,NA,NA)
zoo::na.locf(x, maxgap = 2, na.rm = FALSE)
当 NAs> 最大间隙时不填充任何东西,我想要以下输出:不适用,1、2、3、3、3、5、6、7、7、7,不适用.因此,如果我指定 gap =2,我希望最多填充两个值,并且更多的 NA 应该保持 NA
Instead of not filling anything when number of NAs> max gap I would like the below output: NA,1,2,3,3,3,5,6,7,7,7,NA. So if I specify gap =2 I would want at most two values filled and any more NAs should stay NA
推荐答案
一个选项是使用 tidyr::fill
.方法是将列创建为 desired
和 TempDate
,这样 desired
将与 variable
具有相同的值但是""
(空白)值为variable
的行将desired
作为NA
.类似地,TempDate
将具有与 date
相同的值,但对于 variable
得到 的行,它会有
值.NA
""
An option is to use tidyr::fill
. The approach is to create columns as desired
and TempDate
in such a way that desired
will have same value as variable
but rows with ""
(blank) value for variable
will have desired
as NA
. Similarly TempDate
will have same value as date
but it will have NA
for rows where variable
got ""
values.
fill
desired
和 TempDate
并将 desired
替换为 NA
其中 TempDate
比 date
早 12 个月以上.
fill
both desired
and TempDate
and replace desired
to NA
where TempDate
is older by more than 12 months than date
.
library(tidyverse)
library(lubridate)
df %>% mutate(TempDate = as.Date(ifelse(variable=="", NA, date),origin = "1970-01-01"),
desired = ifelse(variable=="",NA, variable)) %>%
fill(desired, TempDate) %>%
mutate(desired = ifelse(date > (TempDate +months(12)), NA, desired)) %>%
select(-TempDate)
# date variable desired
# 1 2016-01-01 1 1
# 2 2016-02-01 2 2
# 3 2016-03-01 3 3
# 4 2016-04-01 3 3
# 5 2016-05-01 3 3
# 6 2016-06-01 33 33
# 7 2016-07-01 33
# 8 2016-08-01 33
# 9 2016-09-01 33
# 10 2016-10-01 33
# 11 2016-11-01 33
# 12 2016-12-01 33
# 13 2017-01-01 33
# 14 2017-02-01 33
# 15 2017-03-01 33
# 16 2017-04-01 33
# 17 2017-05-01 33
# 18 2017-06-01 33
# 19 2017-07-01 <NA>
# 20 2017-08-01 <NA>
# 21 2017-09-01 34 34
# 22 2017-10-01 34
数据:基于OP分享的图片
df <- data.frame(date = seq(as.Date("2016-01-01"), as.Date("2017-10-01"), by="month"),
variable = c(1,2,3,3,3,33,rep("",14),34,""), stringsAsFactors = FALSE)
df
# date variable
# 1 2016-01-01 1
# 2 2016-02-01 2
# 3 2016-03-01 3
# 4 2016-04-01 3
# 5 2016-05-01 3
# 6 2016-06-01 33
# 7 2016-07-01
# 8 2016-08-01
# 9 2016-09-01
# 10 2016-10-01
# 11 2016-11-01
# 12 2016-12-01
# 13 2017-01-01
# 14 2017-02-01
# 15 2017-03-01
# 16 2017-04-01
# 17 2017-05-01
# 18 2017-06-01
# 19 2017-07-01
# 20 2017-08-01
# 21 2017-09-01 34
# 22 2017-10-01
这篇关于R:填充时间序列值,但仅在过去 12 个月内的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!