如何最有效地转换字符串“ 2014年1月1日”到POSIXct,即“ 2014-01-01” yyyy-mm-dd [英] How to most efficiently convert a character string of "01 Jan 2014" to POSIXct i.e. "2014-01-01" yyyy-mm-dd
问题描述
在这里,我已经对该问题有部分答案,据解释,我对该问题的理解是:如何在data.table中最有效地重组字符串以实现fasttime。
I already have a partial answer to the problem here, which I understand as far as it is explained: How to most efficiently restructure a character string for fasttime in data.table
但是,此任务已扩展,需要处理原始格式的变化。
However, the task has been extended, and needs to deal with a variation of the orginal formatting.
我有一个很大的数据集,其中有一列字符类的日期格式如下:
I have a large dataset, with a column of dates of character class in the form of:
01 Jan 2014
或:
dd MMM yyyy
我想对其进行重组以馈入 fastPOSIXct
,它仅接受<$ c中的字符输入$ c> POSIXct 订单:
Which I want to restructure to feed into fastPOSIXct
which only accepts character input in POSIXct
order:
yyyy-mm-dd
以上链接的问题指出,一种有效的方法是使用正则表达式,然后提供输出直到 fast.time
。在这里,我是否需要扩展它以包括一种了解每月缩写,转换为数字然后重新排列的方法?我该怎么做?我知道有一个 month.abb
作为内置常量。我应该使用这个,还是有一个更聪明的方法?
The above linked question notes that an efficient approach would be to use regex and then supply the output to fast.time
. Here do I need to extend this to include a method to understand monthly abbreviations, convert to numeric, then rearrange? How would I do this? I know that there is a month.abb
as a built in constant. Should I be using this, or is there a smarter way?
推荐答案
使用 lubridate
怎么办:
x <- "01 Jan 2014"
x
[1] "01 Jan 2014"
library(lubridate)
dmy(x)
[1] "2014-01-01 UTC"
当然, lubridate
函数也接受 tz
参数。要查看可接受参数的完整列表,请参见 OlsonNames()
Of course lubridate
functions accept tz
argument too. To see a complete list of acceptable arguments see OlsonNames()
我决定使用 micro基准测试 c软件包和
lubridate
用一些经验数据更新此答案。
I decided to update this answer with some empirical data using the micro benchmark
package and the lubridate
option for use fasstime.
library(micro benchmark)
microbenchmark(dmy(x), times = 10000)
Unit: milliseconds
expr min lq mean median uq max neval
dmy(x) 1.992639 2.02567 2.142212 2.041514 2.07153 39.1384 10000
options(lubridate.fasttime = T)
microbenchmark(dmy(x), times = 10000)
Unit: milliseconds
expr min lq mean median uq max neval
dmy(x) 1.993326 2.02488 2.136748 2.039467 2.065326 163.2008 10000
这篇关于如何最有效地转换字符串“ 2014年1月1日”到POSIXct,即“ 2014-01-01” yyyy-mm-dd的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!