在 R 中格式化日期(非标准格式) [英] Formatting Dates in R (Non-standard format)

查看:29
本文介绍了在 R 中格式化日期(非标准格式)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

不是 R 的新手,也不是在 R 中格式化日期,也不会问这个问题,但我有非常奇怪的行为,在过去的 2 小时内还没有解决它.

Not new to R or formatting dates in R and wouldn't be asking this question but I am having seriously strange behavior and in the last 2 hours am no closer to resolving it.

我有一个已导入的数据集,想使用 as.POSIXct 格式化日期/时间列.日期是一种非标准格式,我已经应用了我所知道的正确格式.这是我遇到问题的一小部分数据.代码紧随其后.问题是有 4 个 NA 从 "2015-03-08 02:00:00 PST" 开始.是什么赋予了?这似乎是完全随机的,因为它在其他任何 55K 观察中都没有发生.

I have a dataset which I have imported and want to format the date/time column using as.POSIXct. The date is a non-standard format and I've applied what I know to be the proper formatting. Here is a small part of the data that I am having trouble with. Code just after. Problem is that there are 4 NA's starting at "2015-03-08 02:00:00 PST". What gives? This seems completely random as it happens no where else in any of the other 55K observations.

bad.Dates<-c("3/7/2015 14:15", "3/7/2015 14:30", "3/7/2015 14:45", "3/7/2015 15:00", 
         "3/7/2015 15:15", "3/7/2015 15:30", "3/7/2015 15:45", "3/7/2015 16:00", 
         "3/7/2015 16:15", "3/7/2015 16:30", "3/7/2015 16:45", "3/7/2015 17:00", 
         "3/7/2015 17:15", "3/7/2015 17:30", "3/7/2015 17:45", "3/7/2015 18:00", 
         "3/7/2015 18:15", "3/7/2015 18:30", "3/7/2015 18:45", "3/7/2015 19:00", 
         "3/7/2015 19:15", "3/7/2015 19:30", "3/7/2015 19:45", "3/7/2015 20:00", 
         "3/7/2015 20:15", "3/7/2015 20:30", "3/7/2015 20:45", "3/7/2015 21:00", 
         "3/7/2015 21:15", "3/7/2015 21:30", "3/7/2015 21:45", "3/7/2015 22:00", 
         "3/7/2015 22:15", "3/7/2015 22:30", "3/7/2015 22:45", "3/7/2015 23:00", 
         "3/7/2015 23:15", "3/7/2015 23:30", "3/7/2015 23:45", "3/8/2015 0:00", 
         "3/8/2015 0:15", "3/8/2015 0:30", "3/8/2015 0:45", "3/8/2015 1:00", 
         "3/8/2015 1:15", "3/8/2015 1:30", "3/8/2015 1:45", "3/8/2015 2:00", 
         "3/8/2015 2:15", "3/8/2015 2:30", "3/8/2015 2:45", "3/8/2015 3:00", 
         "3/8/2015 3:15", "3/8/2015 3:30", "3/8/2015 3:45", "3/8/2015 4:00", 
         "3/8/2015 4:15", "3/8/2015 4:30", "3/8/2015 4:45", "3/8/2015 5:00", 
         "3/8/2015 5:15", "3/8/2015 5:30", "3/8/2015 5:45", "3/8/2015 6:00", 
         "3/8/2015 6:15", "3/8/2015 6:30", "3/8/2015 6:45", "3/8/2015 7:00", 
         "3/8/2015 7:15", "3/8/2015 7:30", "3/8/2015 7:45", "3/8/2015 8:00", 
         "3/8/2015 8:15", "3/8/2015 8:30", "3/8/2015 8:45", "3/8/2015 9:00", 
         "3/8/2015 9:15", "3/8/2015 9:30", "3/8/2015 9:45", "3/8/2015 10:00", 
         "3/8/2015 10:15", "3/8/2015 10:30", "3/8/2015 10:45", "3/8/2015 11:00", 
         "3/8/2015 11:15", "3/8/2015 11:30", "3/8/2015 11:45", "3/8/2015 12:00", 
         "3/8/2015 12:15", "3/8/2015 12:30", "3/8/2015 12:45", "3/8/2015 13:00", 
         "3/8/2015 13:15", "3/8/2015 13:30", "3/8/2015 13:45", "3/8/2015 14:00", 
         "3/8/2015 14:15", "3/8/2015 14:30", "3/8/2015 14:45", "3/8/2015 15:00", 
         "3/8/2015 15:15") 

as.POSIXct(strptime(bad.Dates,"%m/%d/%Y %H:%M"))

推荐答案

为了使这个例子无论位置如何都可重现/可解决,请通过 tz= 明确指定时区:

To make this example reproducible/solvable regardless of location, specify the timezones via tz= explicitly:

bad.Dates <- c("3/8/2015 1:45", "3/8/2015 2:00", "3/8/2015 2:15",
               "3/8/2015 2:30", "3/8/2015 2:45", "3/8/2015 3:00")
as.POSIXct(bad.Dates, format="%m/%d/%Y %H:%M", tz="US/Pacific")

#[1] "2015-03-08 01:45:00 PST"
#[2] NA                       
#[3] NA                       
#[4] NA                       
#[5] NA                       
#[6] "2015-03-08 03:00:00 PDT"

您得到 NA 是因为这些时间在美国太平洋地区的现代计时中不存在.

You get NAs because those times don't exist in the modern-day timekeeping of the US Pacific region.

美国、加拿大和墨西哥北部边境城市的大部分地区夏令时 (DST) 将于 2015 年 3 月 8 日星期日开始. 人们在遵守夏令时的地区将从凌晨 2 点开始向前一小时(02:00) 到凌晨 3 点 (03:00),当地时间.
来源: http://www.timeanddate.com/news/time/usa-canada-start-dst-2015.html

Most of the United States, Canada, and Mexico's northern border cities will begin Daylight Saving Time (DST) on Sunday, March 8, 2015. People in areas that observe DST will spring forward one hour from 2am (02:00) to 3am (03:00), local time.
Source: http://www.timeanddate.com/news/time/usa-canada-start-dst-2015.html

指定像 "UTC" 这样不遵守夏令时的时区可以解决这个问题.

Specifying a timezone like "UTC" that doesn't observe daylight savings will get around this issue.

as.POSIXct(bad.Dates, format="%m/%d/%Y %H:%M", tz="UTC")
#[1] "2015-03-08 01:45:00 UTC"
#[2] "2015-03-08 02:00:00 UTC"
#[3] "2015-03-08 02:15:00 UTC"
#[4] "2015-03-08 02:30:00 UTC"
#[5] "2015-03-08 02:45:00 UTC"
#[6] "2015-03-08 03:00:00 UTC"

这篇关于在 R 中格式化日期(非标准格式)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆