在具有季节性周期的时间序列中插值缺失值 [英] Interpolate missing values in a time series with a seasonal cycle

查看:473
本文介绍了在具有季节性周期的时间序列中插值缺失值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个时间序列,我想智能地对缺失的值进行插值.特定时间的值受多日趋势及其在每日周期中的位置的影响.

I have a time series for which I want to intelligently interpolate the missing values. The value at a particular time is influenced by a multi-day trend, as well as its position in the daily cycle.

在此示例中,myzoo

start <- as.POSIXct("2010-01-01") 
freq <- as.difftime(6, units = "hours") 
dayvals <- (1:4)*10 
timevals <- c(3, 1, 2, 4) 
index <- seq(from = start, by = freq, length.out = 16)
obs <- (rep(dayvals, each = 4) + rep(timevals, times = 4))
myzoo <- zoo(obs, index)
myzoo[10] <- NA

如果必须执行此操作,则可以使用附近几天的关闭时间加权平均值,或者将当天的值添加到适合较大趋势的功能线中,但是我希望已经存在一些适用于这种情况的软件包或功能?

If I had to implement this, I'd use some kind of weighted mean of close times on nearby days, or add a value for the day to a function line fitted to the larger trend, but I hope there already exist some package or functions that apply to this situation?

略微修改了代码以阐明我的问题.有na.*个方法可以从最近的邻居进行插值,但是在这种情况下,它们无法识别缺失值是当天的最低值.也许解决方案是将数据重塑为宽格式然后进行插值,但是我不想完全忽略同一天的连续值.值得注意的是,diff(myzoo, lag = 4)返回一个10的向量.解决方案可能是reshapena.splinediff.inv的某种组合,但我只是想不通.

Modified the code slightly to clarify my problem. There are na.* methods that interpolate from nearest neighbors, but in this case they do not recognize that the missing value is at the time that is the lowest value of the day. Maybe the solution is to reshape the data to wide format and then interpolate, but I wouldn't like to completely disregard the contiguous values from the same day. It is worth noting that diff(myzoo, lag = 4) returns a vector of 10's. The solution may lie with some combination of reshape, na.spline, and diff.inv, but I just can't figure it out.

以下三种方法不起作用:

Here are three approaches that don't work:

EDIT2.使用以下代码生成的图像.

EDIT2. Image produced using the following code.

myzoo <- zoo(obs, index)
myzoo[10] <- NA # knock out the missing point
plot(myzoo, type="o", pch=16) # plot solid line
points(na.approx(myzoo)[10], col = "red")
points(na.locf(myzoo)[10], col = "blue")
points(na.spline(myzoo)[10], col = "green")
myzoo[10] <- 31 # replace the missing point
lines(myzoo, type = "o", lty=3, pch=16) # dashed line over the gap
legend(x = "topleft", 
       legend = c("na.spline", "na.locf", "na.approx"), 
       col=c("green","blue","red"), pch = 1)

推荐答案

尝试一下:

x <- ts(myzoo,f=4)
fit <- ts(rowSums(tsSmooth(StructTS(x))[,-2]))
tsp(fit) <- tsp(x)
plot(x)
lines(fit,col=2)

该想法是为时间序列使用基本的结构模型,该模型使用卡尔曼滤波器精细处理缺失值.然后使用卡尔曼平滑法估计时间序列中的每个点,包括任何省略的点.

The idea is to use a basic structural model for the time series, which handles the missing value fine using a Kalman filter. Then a Kalman smooth is used to estimate each point in the time series, including any omitted.

为了使用StructTS,我必须将您的Zoo对象转换为频率为4的ts对象.您可能需要再次将拟合值更改回Zoo.

I had to convert your zoo object to a ts object with frequency 4 in order to use StructTS. You may want to change the fitted values back to zoo again.

这篇关于在具有季节性周期的时间序列中插值缺失值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆