在 zooreg 时间序列中无法找到非唯一索引条目 [英] Trouble finding non-unique index entries in zooreg time series

查看:28
本文介绍了在 zooreg 时间序列中无法找到非唯一索引条目的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有几年的数据,我试图将这些数据用于动物园对象 (Dropbox 上的 .csv).一旦数据被强制转换为动物园对象,我就会收到错误消息.我在索引中找不到任何重复项.

I have several years of data that I'm trying to work into a zoo object (.csv at Dropbox). I'm given an error once the data is coerced into a zoo object. I cannot find any duplicated in the index.

df <- read.csv(choose.files(default = "", caption = "Select data source", multi = FALSE), na.strings="*")
df <- read.zoo(df, format = "%Y/%m/%d %H:%M", regular = TRUE, row.names = FALSE, col.names = TRUE, index.column = 1)
Warning message:
In zoo(rval3, ix) :
  some methods for "zoo" objects do not work if the index entries in ‘order.by’ are not unique

我试过了:

sum(duplicated(df$NST_DATI))

但结果是0.

感谢您的帮助!

推荐答案

您正在错误地使用 read.zoo(...).根据文档:

You are using read.zoo(...) incorrectly. According to the documentation:

为了处理索引,read.zoo以索引为第一调用FUN争论.如果未指定 FUN 则如果有多个索引列粘贴在一起,每列之间有一个空格.使用索引列或粘贴的索引列: 1. 如果指定了 tz 则索引列转换为 POSIXct.2. 如果指定了格式,则索引列转换为日期.3. 否则,启发式尝试在数字"、日期"和POSIXct"之间做出决定.如果格式和/或 tz 被指定然后它们被传递给转换函数

To process the index, read.zoo calls FUN with the index as the first argument. If FUN is not specified then if there are multiple index columns they are pasted together with a space between each. Using the index column or pasted index column: 1. If tz is specified then the index column is converted to POSIXct. 2. If format is specified then the index column is converted to Date. 3. Otherwise, a heuristic attempts to decide among "numeric", "Date" and "POSIXct". If format and/or tz is specified then they are passed to the conversion function as well.

您正在指定 format=... 所以 read.zoo(...) 将所有内容转换为日期,而不是 POSIXct.显然,有很多很多重复的日期.

You are specifying format=... so read.zoo(...) converts everything to Date, not POSIXct. Obviously, there are many, many duplicated dates.

简单来说,正确的解决方案是使用:

Simplistically, the correct solution is to use:

df <- read.zoo(df, FUN=as.POSIXct, format = "%Y/%m/%d %H:%M")
# Error in read.zoo(df, FUN = as.POSIXct, format = "%Y/%m/%d %H:%M") : 
#   index has bad entries at data rows: 507 9243 18147 26883 35619 44355

但是正如你所看到的,这也不起作用.这里的问题要微妙得多.索引使用 POSIXct 转换,但在系统时区(在我的系统上是美国东部).引用行的时间戳与从标准到 DST 的转换一致,因此这些时间在美国东部时区不存在.如果您使用:

but as you can see this does not work either. Here the problem is much more subtle. The index is converted using POSIXct, but in the system time zone (which on my system is US Eastern). The referenced rows have timestamps that coincide with the changeover from Standard to DST, so these times do not exist in the US Eastern timezone. If you use:

df <- read.zoo(df, FUN=as.POSIXct, format = "%Y/%m/%d %H:%M", tz="UTC")

数据导入正确.

编辑:

正如@G.Grothendieck 指出的那样,这也行得通,而且更简单:

As @G.Grothendieck points out, this would also work, and is simpler:

df <- read.zoo(df, tz="UTC")

您应该将 tz 设置为适合数据集的任何时区.

You should set tz to whatever timezome is appropriate for the dataset.

这篇关于在 zooreg 时间序列中无法找到非唯一索引条目的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆