将包含时区的日期字符串转换为R中的POSIXct [英] Convert date string that contains time zone to POSIXct in R

查看:147
本文介绍了将包含时区的日期字符串转换为R中的POSIXct的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个带有这种格式的日期的向量(前6行的示例):

I have a vector with dates in this format (example of the first 6 rows):

 Dates<-c(
   "Sun Oct 04 20:33:05 EEST 2015",
   "Sun Oct 04 20:49:23 EEST 2015",
   "Sun Oct 04 21:05:25 EEST 2015",
   "Mon Sep 28 10:02:38 IDT 2015", 
   "Mon Sep 28 10:17:50 IDT 2015",
   "Mon Sep 28 10:39:48 IDT 2015")

我尝试使用as.Date()函数将此变量Dates读取为R:

I tried to read this variable Dates to R using as.Date() function:

as.Date(Dates,format = "%a %b %d %H:%M:%S %Z %Y")

,但是处理失败,因为输入中不支持%Z参数.在整个向量中,时区有所不同.相对于时区正确读取数据有哪些选择?

but the process failed as %Z parameter is not supported in the input. The time zones differ throughout the vector. What are the alternatives to read data correctly with respect to the time zone?

推荐答案

此解决方案需要一些简化的假设.假设向量中有很多元素,最好的方法是使用时区偏移量数据库找出每次时间是什么(在选定的区域设置中,例如GMT).我使用的时区数据是来自 https://timezonedb.com/download

This solution requires some simplifying assumptions. Assuming you have many elements in your vector, the best approach is to use a database of timezone offsets to figure out what each time is (in a chosen locale, such as GMT). The timezone data I used is the timezone.csv file from https://timezonedb.com/download

#Create sample data
Dates<-c(
  "Sun Oct 04 20:33:05 EEST 2015",
  "Sun Oct 04 20:49:23 EEST 2015",
  "Sun Oct 04 21:05:25 EEST 2015",
  "Mon Sep 28 10:02:38 IDT 2015", 
  "Mon Sep 28 10:17:50 IDT 2015",
  "Mon Sep 28 10:39:48 IDT 2015")

#separate timezone string from date/time info
no_timezone <- paste(substr(Dates, 1, 19), substr(Dates, nchar(Dates)-3, nchar(Dates)))
timezone <- as.data.frame(substr(Dates, 21, nchar(Dates)-5))
colnames(timezone) <- "abbreviation"

#reference timezone database to get offsets from GMT
timezone_db <- read.csv(file="timezonedb/timezone.csv", header=FALSE)
colnames(timezone_db) <- c("zone_id", "abbreviation", "time_start", "gmt_offset", "dst")
timezone_db <- timezone_db[timezone_db$dst == 0, ]
timezone_db <- unique(timezone_db[,c("abbreviation", "gmt_offset")])
timezone_db <- timezone_db[!duplicated(timezone_db$abbreviation), ]

#adjust all time to GMT
time_adjust <- merge(timezone, timezone_db, all.x=TRUE, by="abbreviation")
gmt_time <- strptime(no_timezone, format = "%a %b %d %H:%M:%S %Y", tz="GMT")

#final data
Dates_final <- gmt_time - time_adjust$gmt_offset

根据数据的精确度,如有必要,请谨慎调整夏令时.另外,我对时区了解不多,但是我注意到由于某些原因,某些时区可能具有多个偏移量.在原始数据库中,由于某种原因,CLT(智利时间)从格林尼治标准时间开始可能需要3-5小时.

Depending on how exact your data needs to be, be careful to adjust for daylight savings if necessary. Also, I don't know much about time zones, but I noticed that for some reason, certain time zones can have multiple offsets. In the original database, CLT (Chilean time) can vary from 3-5 hours from GMT, for some reason.

对于本练习,我的代码只是从数据库中获取每个时区的第一个偏移量,并且不假设任何夏令时.如果您的工作不需要这种精度,这可能就足够了,但是您应该对每种工作进行质量检查和验证.

For this exercise, my code simply takes the first of each time zone's offset from the database and assumes no daylight savings day. This may be sufficient if your work doesn't require such precision, but you should QA and validate your work either way.

此外,请注意,此解决方案对于日期更改也应该是可靠的.例如,如果将时间从凌晨1点调整为晚上11点,则日期应恢复为一天.

Also, note that this solution should be robust for date changes as well. For example, if the time is adjusted from 1am to 11pm, then the date should revert back one day.

这篇关于将包含时区的日期字符串转换为R中的POSIXct的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆