顺序日期的格式(带后缀-st,-nd,-rd,-th的月份中的日期) [英] Format for ordinal dates (day of month with suffixes -st, -nd, -rd, -th)
问题描述
我错过了什么吗?我不知道如何将以下内容转换为 Date
s,即每月的哪一天(%d
)的后缀为 -st
, -nd
, -rd
, -th
:
Am I missing something? I can't figure out how to convert the following to Date
s, where day of the month (%d
) has the ordinal suffixes -st
, -nd
, -rd
, -th
:
ord_dates <- c("September 1st, 2016", "September 2nd, 2016",
"September 3rd, 2016", "September 4th, 2016")
?strptime
似乎没有列出序数后缀的简写,并且不能自动处理:
?strptime
doesn't appear to list a shorthand for the ordinal suffix, and it isn't handled automagically:
as.Date(ord_dates, format = c("%B %d, %Y"))
#[1] NA NA NA NA
是否存在用于处理 format
参数中被忽略字符的令牌?我想念的令牌是什么?
Is there a token for handling ignored characters in the format
argument? A token I'm missing?
我能想到的最好的方法是(可能有一个较短的正则表达式,但思路相同):
Best I can come up with is (there may a shorter regex, but same idea):
as.Date(gsub("([0-9]+)(st|nd|rd|th)", "\\1", ord_dates), format = "%B %d, %Y")
# [1] "2016-09-01" "2016-09-02" "2016-09-03" "2016-09-04"
这类数据似乎相对常见;我是否缺少某些东西?
Seems like this sort of data should be relatively common; am I missing something?
推荐答案
享受 lubridate
的力量:
library(lubridate)
mdy(ord_dates)
[1] "2016-09-01" "2016-09-02" "2016-09-03" "2016-09-04"
在内部, lubridate
没有任何启用此功能的特殊转换规范。相反, lubridate
首先(通过智能猜测)使用格式%B%dst,%Y
。这将得到 ord_dates
的第一个元素。
Internally, lubridate
doesn't have any special conversion specifications which enable this. Rather, lubridate
first uses (by smart guessing) the format "%B %dst, %Y"
. This gets the first element of ord_dates
.
然后检查 NA
s并在其余元素上重复其智能猜测,对%B%dnd,%Y
进行设置以获得第二个元素。它以这种方式继续下去,直到没有 NA
剩下了(在这种情况下,在4次迭代之后就会发生),或者直到它的聪明的猜测未能找到可能的格式候选者为止。
It then checks for NA
s and repeats its smart guessing on the remaining elements, settling on "%B %dnd, %Y"
to get the second element. It continues in this way until there are no NA
s left (which happens in this case after 4 iterations), or until its smart guessing fails to turn up a likely format candidate.
您可以想像,这会使 lubridate
变慢,而且的确-仅使用智能手表的速度的一半上面@alistaire建议的正则表达式:
You can imagine this makes lubridate
slower, and it does -- about half the speed of just using the smart regex suggested by @alistaire above:
set.seed(109123)
ord_dates <- sample(
c("September 1st, 2016", "September 2nd, 2016",
"September 3rd, 2016", "September 4th, 2016"),
1e6, TRUE
)
library(microbenchmark)
microbenchmark(times = 10L,
lubridate = mdy(ord_dates),
base = as.Date(sub("\\D+,", "", ord_dates),
format = "%B %e %Y"))
# Unit: seconds
# expr min lq mean median uq max neval cld
# lubridate 2.167957 2.219463 2.290950 2.252565 2.301725 2.587724 10 b
# base 1.183970 1.224824 1.218642 1.227034 1.228324 1.229095 10 a
<$ c中的明显优势$ c> lubridate 的优点是其简洁性和灵活性。
The obvious advantage in lubridate
's favor being its conciseness and flexibility.
这篇关于顺序日期的格式(带后缀-st,-nd,-rd,-th的月份中的日期)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!