如何创建“NA"对于时间序列中的缺失数据 [英] How to create "NA" for missing data in a time series
问题描述
我有几个如下所示的数据文件:
I have several files of data that look like this:
X code year month day pp
1 4515 1953 6 1 0
2 4515 1953 6 2 0
3 4515 1953 6 3 0
4 4515 1953 6 4 0
5 4515 1953 6 5 3.5
有时会丢失数据,但我没有 NA,这些行根本不存在.当数据丢失时,我需要创建 NA.虽然我可以通过将其转换为 zoo 对象并检查严格的规律性(我以前从未使用过 zoo)来识别何时发生,但我使用了以下代码:
Sometimes there is data missing, but I don't have NAs, the rows simply don't exist. I need to create NAs when the data is missing. I though I could start by identifying when that occurs by converting it to a zoo object and check for strict regularity (I never used zoo before), I used the following code:
z.date<-paste(CET$year, CET$month, CET$day, sep="/")
z <- read.zoo(CET, order.by= z.date )
reg<-is.regular(z, strict = TRUE)
但答案总是正确的!
谁能告诉我为什么不工作?或者更好的是,告诉我一种在数据丢失时创建 NA 的方法(有或没有 zoo 包)?
Can anyone tell me why is not working? Or even better, tell me a way to create NAs when the data is missing (with or without zoo package)?
谢谢
推荐答案
seq
函数有一些有趣的功能,您可以使用这些功能轻松生成完整的日期序列.例如,以下代码可用于生成从 4 月 25 日开始的日期序列:
The seq
function has some interesting features that you can use to easily generate a complete sequence of dates. For example, the following code can be used to generate a sequence of dates starting on April 25:
此功能记录在 ?seq.Date
start = as.Date("2011/04/25")
full <- seq(start, by='1 day', length=15)
full
[1] "2011-04-25" "2011-04-26" "2011-04-27" "2011-04-28" "2011-04-29"
[6] "2011-04-30" "2011-05-01" "2011-05-02" "2011-05-03" "2011-05-04"
[11] "2011-05-05" "2011-05-06" "2011-05-07" "2011-05-08" "2011-05-09"
现在使用相同的原理来生成一些缺失"行的数据,方法是每隔 2 天生成一次序列:
Now use the same principle to generate some data with "missing" rows, by generating the sequence for every 2nd day:
partial <- data.frame(
date=seq(start, by='2 day', length=6),
value=1:6
)
partial
date value
1 2011-04-25 1
2 2011-04-27 2
3 2011-04-29 3
4 2011-05-01 4
5 2011-05-03 5
6 2011-05-05 6
要回答您的问题,可以使用矢量下标或 match
函数来创建具有 NA 的数据集:
To answer your question, one can use vector subscripting or the match
function to create a dataset with NAs:
with(partial, value[match(full, date)])
[1] 1 NA 2 NA 3 NA 4 NA 5 NA 6 NA NA NA NA
将此结果与原始完整数据相结合:
To combine this result with the original full data:
data.frame(Date=full, value=with(partial, value[match(full, date)]))
Date value
1 2011-04-25 1
2 2011-04-26 NA
3 2011-04-27 2
4 2011-04-28 NA
5 2011-04-29 3
6 2011-04-30 NA
7 2011-05-01 4
8 2011-05-02 NA
9 2011-05-03 5
10 2011-05-04 NA
11 2011-05-05 6
12 2011-05-06 NA
13 2011-05-07 NA
14 2011-05-08 NA
15 2011-05-09 NA
这篇关于如何创建“NA"对于时间序列中的缺失数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!