是否可以使用类似 `tz=NULL` 之类的东西?... `as.POSIXct` 默认为依赖于语言环境的时区(与 `as.Date` 不同),这会导致问题 [英] Is it possible to use something like `tz=NULL`?... `as.POSIXct` defaults to locale-dependent timezone (unlike `as.Date`), which causes issues

查看:24
本文介绍了是否可以使用类似 `tz=NULL` 之类的东西?... `as.POSIXct` 默认为依赖于语言环境的时区(与 `as.Date` 不同),这会导致问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道这是一个长期存在的、根深蒂固的问题,但这是我经常遇到的问题,而且我看到 R 的初学者经常遇到困难,以至于我很想有一个令人满意的解决方案.到目前为止,我的 google 和 SO 搜索都是空的,但如果在其他地方重复,请指出正确的方向.

I know this is a long-standing, deeply embedded issue, but it's something I come up against so regularly, and that I see beginners to R struggle with so regularly, that I'd love to have a satisfactory solution. My google and SO searches have come up empty so far, but please point me in the right direction if this is duplicated elsewhere.

TL;DR: 有没有办法在没有时区的情况下使用类似 POSIXct 的类?无论数据集的实际时区如何,我通常都使用 tz="UTC" ,但它是一个凌乱的黑客 IMO,我不是特别喜欢它.我想要的是类似于 tz=NULL 的东西,它的行为方式与 UTC 相同,但实际上没有将UTC"添加为 tzone 属性.

TL;DR: Is there a way to use something like the POSIXct class without a timezone? I generally use tz="UTC" regardless of the actual timezone of the dataset, but it's a messy hack IMO, and I don't particularly like it. What I want is something like tz=NULL, which would behave the same way as UTC, but without actually adding "UTC" as a tzone attribute.

我将从一个典型的时区问题的示例(有很多)开始.创建具有 POSIXct 值的对象:

I'll start with an example (there are plenty) of typical timezone issues. Creating an object with POSIXct values:

df <- data.frame( timestamp = as.POSIXct( c( "2018-01-01 03:00:00",
                                             "2018-01-01 12:00:00" ) ),
                  a = 1:2 )
df

#             timestamp a
# 1 2018-01-01 03:00:00 1
# 2 2018-01-01 12:00:00 2

没关系,但是我尝试将时间戳转换为日期:

That's all fine, but then I try to convert the timestamps to dates:

df$date <- as.Date( df$timestamp )
df

#             timestamp a       date
# 1 2018-01-01 03:00:00 1 2017-12-31
# 2 2018-01-01 12:00:00 2 2018-01-01

日期转换不正确,因为我的计算机语言环境是澳大利亚东部时间,这意味着时间戳的数值已被偏移与我的语言环境相关的偏移量(在本例中为 -11 小时).我们可以通过将时区强制为 UTC,然后比较前后的值来看到这一点:

The dates have converted incorrectly, because my computer locale is in Australian Eastern Time, meaning that the numeric values of the timestamps have been shifted by the offset relevant to my locale (in this case -11hrs). We can see this by forcing the timezone to UTC, then comparing the values before and after:

df$timestamp[1]
# [1] "2018-01-01 03:00:00 AEDT"

x <- lubridate::force_tz( df$timestamp[1], "UTC" ); x
# [1] "2018-01-01 03:00:00 UTC"

difftime( df$timestamp[1], x )
# Time difference of -11 hours

这只是由时区引起的问题的一个例子.还有其他的,但我不会在这里讨论.

That's just one example of the issues cause by timezones. There are others, but I won't go into them here.

我不想要这种行为,所以我需要说服 as.POSIXct 不要弄乱我的时间戳.我通常通过使用 tz="UTC" 来做到这一点,它工作正常,除了我向不真实的数据添加信息.这些时间不是 UTC,我只是说这是为了避免时移问题.这是一种黑客行为,每当我将我的数据提供给其他人时,他们认为时间戳是 UTC 而不是 UTC 是可以原谅的.为避免这种情况,我通常将实际时区添加到对象/列名称中,并希望我传递数据的任何人都能理解为什么有人会用与对象本身中的时区不同的时区来标记对象:

I don't want that behaviour, so I need to convince as.POSIXct not to mess with my timestamps. I generally do this by using tz="UTC", which works fine, except that I'm adding information to the data that isn't real. These times are NOT in UTC, I'm just saying that to avoid time-shift issues. It's a hack, and any time I give my data to someone else, they could be forgiven for thinking that the timestamps are in UTC when they're not. To avoid this, I generally add the actual timezone to the object/column name, and hope that anyone I pass my data on to will understand why someone would label an object with a timezone different to the one in the object itself:

df <- data.frame( timestamp.AET = as.POSIXct( c( "2018-01-01 03:00:00",
                                                 "2018-01-01 12:00:00" ),
                                              tz = "UTC" ),
                  a = 1:2 )
df$date <- as.Date( df$timestamp )
df

#         timestamp.AET a       date
# 1 2018-01-01 03:00:00 1 2018-01-01
# 2 2018-01-01 12:00:00 2 2018-01-01

<小时>

我希望什么

我真正想要的是一种无需指定时区即可使用 POSIXct 的方法.我不希望时间以任何方式混乱.像值在 UTC 中一样执行所有操作,并将任何时区详细信息(例如偏移量、夏令时等)留给用户.只是不要假装他们实际上在 UTC 中.这是我的理想:


What I'm hoping for

What I really want is a way to use POSIXct without having to specify a timezone. I don't want the times messed with in any way. Do everything as though the values were in UTC, and leave any timezone details like offsets, daylight savings, etc to the user. Just don't pretend they actually ARE in UTC. Here's my ideal:

x <- as.POSIXct( "2018-01-01 03:00:00" ); x
# [1] "2018-01-01 03:00:00"

attr( x, "tzone" )
# [1] NULL

shifted <- lubridate::force_tz( x, "UTC" )
shifted == x
# [1] TRUE

as.numeric( shifted ) == as.numeric( x )
# [1] TRUE

as.Date( x )
# [1] "2018-01-01"

所以对象上根本没有时区属性.日期转换的工作原理与打印值的预期相同.如果存在夏令时时移或任何其他特定于区域设置的问题,用户(我或其他人)需要自己处理.

So there's no timezone attribute on the object at all. The date conversion works as one would expect from the printed value. If there are daylight savings time-shifts, or any other locale-specific issues, the user (me or someone else) needs to deal with that themselves.

我相信类似的东西在POSIXlt,但我真的不想转向那个.chron 或其他面向时间序列的包可能是另一种解决方案,但我认为 POSIXct 被更广泛地使用和接受,这似乎在 中应该是可能的基础::.带有 tz="UTC"POSIXct 对象正是我所需要的,我只是不想为了让它表现得那样而对时区撒谎我想要(而且我相信大多数初学者都希望 R 期望).

I believe something similar to this is possible in POSIXlt, but I really don't want to shift to that. chron or another timeseries-oriented package might be another solution, but I think POSIXct is more widely used and accepted, and this seems like something that should be possible within base::. A POSIXct object with tz="UTC" is exactly what I need, I just don't want to have to lie about timezones in order to get it to behave the way I want (and I believe most beginners to R expect).

那么其他人在这里做什么?有没有一种简单的方法来使用 POSIXct 而没有我错过的时区?有没有比 tz="UTC" 更好的解决方法?这是别人在做的吗?

So what do others do here? Is there an easy way to use POSIXct without a timezone that I've missed? Is there a better work-around than tz="UTC"? Is that what others are doing?

推荐答案

我不确定我是否理解您的问题. 在(重新)阅读您的帖子和随后的评论后,我明白你的意思.

总结:

as.POSIXct 从您的系统确定 tz.as.Date 对于 POSIXct 类具有默认的 tz = "UTC".因此,除非您在 tz = "UTC" 中,否则日期可能会更改;解决方案是将 tzDate 一起使用,或更改 as.Date.POSIXct 的行为(请参阅下面的更新).

as.POSIXct determines tz from your system. as.Date has default tz = "UTC" for class POSIXct. So unless you're in tz = "UTC", dates may change; the solution is to use tz with Date, or to change the behaviour of as.Date.POSIXct (see update below).

如果您as.POSIXct指定显式tz,您可以简单地指定tz = ""as.Date 以强制执行系统特定的时区.

If you don't specify an explicit tz with as.POSIXct, you can simply specify tz = "" with as.Date to enforce a system-specific timezone.

df <- data.frame(
    timestamp = as.POSIXct(c("2018-01-01 03:00:00", "2018-01-01 12:00:00")),
    a = 1:2)

df$date <- as.Date(df$timestamp, tz = "")
df;
#           timestamp a       date
#1 2018-01-01 03:00:00 1 2018-01-01
#2 2018-01-01 12:00:00 2 2018-01-01

案例 2

如果你do使用as.POSIXct设置显式tz,你可以从tz中提取tzcode>POSIXct 对象,并将其传递给 as.Date

Case 2

If you do set an explicit tz with as.POSIXct, you can extract tz from the POSIXct object, and pass it on to as.Date

df <- data.frame(
    timestamp = as.POSIXct(c("2018-01-01 03:00:00", "2018-01-01 12:00:00"), tz = "UTC"),
    a = 1:2)

tz <- attr(df$timestamp, "tzone")
tz
#[1] "UTC"

df$date <- as.Date(df$timestamp, tz = tz)
df
#    timestamp a       date
#1 2018-01-01 03:00:00 1 2018-01-01
#2 2018-01-01 12:00:00 2 2018-01-01

<小时>

更新

存在关于 Dirk Eddelbuettel 的anytime相关讨论> GitHub 项目站点.讨论结果有点循环,所以恐怕它没有提供太多的理解为什么 as.Date.POSIXct> 从 POSIXct 继承 tz.我可能会称其为基础 R 特质(或如 Dirk 所说:[T]hese 是基础 R 中已知的怪癖").


Update

There exists a related discussion on Dirk Eddelbuettel's anytime GitHub project site. The discussion turns out somewhat circular, so I'm afraid it does not offer too much in terms of understanding why as.Date.POSIXct does not inherit tz from POSIXct. I would probably call this a base R idiosyncrasy (or as Dirk calls it: "[T]hese are known quirks in Base R").

至于解决方案:我会改变 as.Date.POSIXct 的行为,而不是 as.POSIXct 的默认行为.

As for a solution: I would change the behaviour of as.Date.POSIXct rather than the default behaviour of as.POSIXct.

我们可以简单地重新定义 as.Date.POSIXct 以从 POSIXct 对象继承 tz.

We could simply redefine as.Date.POSIXct to inherit tz from the POSIXct object.

as.Date.POSIXct <- function(x) {
    as.Date(as.POSIXlt(x, tz = attr(x, "tzone")))
}

然后您就可以为您的示例案例获得一致的结果:

Then you get consistent results for your sample case:

df <- data.frame(
    timestamp = as.POSIXct(c("2018-01-01 03:00:00", "2018-01-01 12:00:00")),
    a = 1:2)
df$date <- as.Date(df$timestamp)
df
#timestamp a       date
#1 2018-01-01 03:00:00 1 2018-01-01
#2 2018-01-01 12:00:00 2 2018-01-01

这篇关于是否可以使用类似 `tz=NULL` 之类的东西?... `as.POSIXct` 默认为依赖于语言环境的时区(与 `as.Date` 不同),这会导致问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆