在具有季节性周期的时间序列中插入缺失值 [英] Interpolate missing values in a time series with a seasonal cycle

查看:25
本文介绍了在具有季节性周期的时间序列中插入缺失值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个时间序列,我想智能地插入缺失值.特定时间的值受多日趋势及其在每日周期中的位置的影响.

I have a time series for which I want to intelligently interpolate the missing values. The value at a particular time is influenced by a multi-day trend, as well as its position in the daily cycle.

这是一个示例,其中 myzoo

start <- as.POSIXct("2010-01-01") 
freq <- as.difftime(6, units = "hours") 
dayvals <- (1:4)*10 
timevals <- c(3, 1, 2, 4) 
index <- seq(from = start, by = freq, length.out = 16)
obs <- (rep(dayvals, each = 4) + rep(timevals, times = 4))
myzoo <- zoo(obs, index)
myzoo[10] <- NA

如果我必须实现这一点,我会在附近的日子使用某种加权平均关闭时间,或者将当天的值添加到适合更大趋势的函数线,但我希望已经存在一些适用于这种情况的包或功能?

If I had to implement this, I'd use some kind of weighted mean of close times on nearby days, or add a value for the day to a function line fitted to the larger trend, but I hope there already exist some package or functions that apply to this situation?

稍微修改了代码以澄清我的问题.有 na.* 方法可以从最近的邻居中进行插值,但在这种情况下,它们无法识别缺失值是在一天中的最低值的时间.也许解决方案是将数据重塑为宽格式,然后进行插值,但我不想完全忽略同一天的连续值.值得注意的是 diff(myzoo, lag = 4) 返回一个 10 的向量.解决方案可能在于 reshapena.splinediff.inv 的某种组合,但我就是想不通.

Modified the code slightly to clarify my problem. There are na.* methods that interpolate from nearest neighbors, but in this case they do not recognize that the missing value is at the time that is the lowest value of the day. Maybe the solution is to reshape the data to wide format and then interpolate, but I wouldn't like to completely disregard the contiguous values from the same day. It is worth noting that diff(myzoo, lag = 4) returns a vector of 10's. The solution may lie with some combination of reshape, na.spline, and diff.inv, but I just can't figure it out.

以下是三种行不通的方法:

Here are three approaches that don't work:

编辑2.使用以下代码生成的图像.

EDIT2. Image produced using the following code.

myzoo <- zoo(obs, index)
myzoo[10] <- NA # knock out the missing point
plot(myzoo, type="o", pch=16) # plot solid line
points(na.approx(myzoo)[10], col = "red")
points(na.locf(myzoo)[10], col = "blue")
points(na.spline(myzoo)[10], col = "green")
myzoo[10] <- 31 # replace the missing point
lines(myzoo, type = "o", lty=3, pch=16) # dashed line over the gap
legend(x = "topleft", 
       legend = c("na.spline", "na.locf", "na.approx"), 
       col=c("green","blue","red"), pch = 1)

推荐答案

试试这个:

x <- ts(myzoo,f=4)
fit <- ts(rowSums(tsSmooth(StructTS(x))[,-2]))
tsp(fit) <- tsp(x)
plot(x)
lines(fit,col=2)

这个想法是使用时间序列的基本结构模型,它使用卡尔曼滤波器很好地处理缺失值.然后使用卡尔曼平滑估计时间序列中的每个点,包括任何省略的点.

The idea is to use a basic structural model for the time series, which handles the missing value fine using a Kalman filter. Then a Kalman smooth is used to estimate each point in the time series, including any omitted.

为了使用 StructTS,我必须将您的 zoo 对象转换为频率为 4 的 ts 对象.您可能想再次将拟合值更改回 zoo.

I had to convert your zoo object to a ts object with frequency 4 in order to use StructTS. You may want to change the fitted values back to zoo again.

这篇关于在具有季节性周期的时间序列中插入缺失值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆