插值不规则时间序列的数据 [英] Interpolate data for irregular time series

查看:92
本文介绍了插值不规则时间序列的数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我尝试在此处插入完整的csv的meterValue:

解决方案

使用 read.csv.zoo 转换为 Date 类(汇总重复的日期)来读取文件最后一个被使用.然后转换为 ts 并返回到Zoo,它将用NA填充空天.现在使用 na.approx 填写NA值.由于 ts 无法表示 Date 类,因此所得的序列将具有表示日期的数字,因此请将其转换回去.

 库(zoo)z<-read.csv.zoo("test_6578.csv",FUN =日期,聚合= function(x)tail(x,1))zz<-na.approx(as.zoo(as.ts(z)))时间(zz)<-as.Date(时间(zz)) 

在评论中,有人声称出口处有孔,但事实并非如此.连续时间之间的差等于1,并且没有NA.

 表(diff(time(zz)))## 1## 106任何(is.na(zz))## [1]错误任何(is.na(时间(zz)))## [1]否 

这里有一个例子,说明了一个小时而不是一天的差异.

  to.hour<-函数(x)as.POSIXct(trunc(as.POSIXct(x,origin ="1970-01-01"),"hour"))z<-read.csv.zoo("test_6578.csv",FUN =时数,合计= function(x)tail(x,1))zz<-na.approx(as.zoo(as.ts(z)))time(zz)<-as.POSIXct(time(zz),origin ="1970-01-01")plot(zz [,2],type ="p",pch =.") 

I try to interpolate this meterValue, full csv here: https://drive.google.com/open?id=18cwtw-chAB-FqqCesXZJ-6NB6eHFJlgQ

localminute,dataid,meter_value
2015-10-03 09:51:53,6578,157806
2015-10-13 13:41:49,6578,158086
:
:
2016-01-17 16:00:33,6578,164544  #end of meter_value data for ID=6578

Based on what @G. Grothendieck, suggested, and I got error at z.interpolate (merging data)

D6578z <- read.csv.zoo("test_6578.csv")[,2]
D6578zd <- to.daily(D6578z)[,4]
#Warning messages:
                #1: In zoo(xx, order.by = index(x), ...) : some methods for "zoo" objects do not work if the index entries in ‘order.by’ are not unique
                #2: In zoo(rval, index(x)[i]) :some methods for "zoo" objects do not work if the index entries in ‘order.by’ are not unique

     test_6578t <- time(D6578zd)

     plot(D6578zd,type="p",xaxt="n", pch=19, col="blue",cex=1.5)

     diff(test_6578t) 

     t.daily6578 <- seq(from =min(test_6578t),to=max(test_6578t),by="1 day")

     dummy6578 <- zoo(,t.daily6578) 

     z.interpolated <- merge(D6578zd,dummy6578,all=TRUE)
        *#Error in merge.zoo(D6578zd, dummy6578, all = TRUE) :  series cannot be merged with non-unique index entries in a series*

Solution of R code for one hour time difference in interpolated data provided by @G. Grothendieck, as below.

Hi @G. Grothendieck, Thanks for solution code. I have some questions to clarify with you regarding about your code.

  `line1: to.hour <- function(x) as.POSIXct(trunc(as.POSIXct(x, origin = "1970-01-01"), "hour"))

    line2: z <- read.csv.zoo("test_6578.csv", FUN = to.hour, aggregate = function(x) tail(x, 1))`

         `line3: zz <- na.approx(as.zoo(as.ts(z)))`

        `line4: time(zz) <- as.POSIXct(time(zz), origin = "1970-01-01")`

in line1, why "as.POSIXct" before `trunc(as.POSIXct(x,origin ="1970-01-01")?
I understand that "trunc" function round up the datetime value.

In line2, What does this code mean "FUN=to.hour, aggregate =function(x) tail (x,1)" work?

As I could not understand what is tail(x,1). I extracted the z function in csv file, I observed that only dataid and meter_value columns are generated when ‘read.csv.zoo’ function is used.

In line3, I understand that, zz function gives interpolated data but I didn’t fully understand the code "na.approx(as.zoo(as.ts(z)))" , since z is already zoo series after read.csv.zoo, why we still have to use "as.zoo" and "as.ts" in "na.approx" line?

what is the difference between zoo and zooreg series?

In line4, "time(zz)" is the index of "zz" function?

Thanks in advance your explanation.

I could plot the interpolated data with time difference=1hour.

解决方案

Read the file in using read.csv.zoo converting to Date class aggregating duplicate dates such that the last one is used. Then convert to ts and back to zoo which will fill in empty days with NAs. Now use na.approx to fill in the NA values. Since ts cannot represent Date class the resulting series will have numbers representing dates so convert them back.

library(zoo)
z <- read.csv.zoo("test_6578.csv", FUN = as.Date, aggregate = function(x) tail(x, 1))
zz <- na.approx(as.zoo(as.ts(z)))
time(zz) <- as.Date(time(zz))

In comments there was a claim that there are holes in the ouptut but that is not the case. The difference between successive times is identically 1 and there are no NAs.

table(diff(time(zz)))
##   1 
## 106 

any(is.na(zz)) 
## [1] FALSE

any(is.na(time(zz)))
## [1] FALSE

Here is an example of doing this for one hour instead of one day differences.

to.hour <- function(x) as.POSIXct(trunc(as.POSIXct(x, origin = "1970-01-01"), "hour"))
z <- read.csv.zoo("test_6578.csv", FUN = to.hour, aggregate = function(x) tail(x, 1))
zz <- na.approx(as.zoo(as.ts(z)))
time(zz) <- as.POSIXct(time(zz), origin = "1970-01-01")

plot(zz[, 2], type = "p", pch = ".")

这篇关于插值不规则时间序列的数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆