直接将字符串转换为IDateTime [英] cast string directly to IDateTime
问题描述
我使用新版本的 data.table
,尤其是AWESOME fread
函数。我的文件包含作为字符串加载的日期(因为我不知道这样做)否则看起来像 01APR2008:09:00:00
。
我需要在这些数据时间对data.table进行排序,然后在 IDateTime
格式(或任何我还不知道的事情)。
> strptime(01APR2008:09:00:00,%d%b%Y:%H:%M:%S)
[1]2008-04-01 09:00:00
> IDateTime(strptime(01APR2008:09:00:00,%d%b%Y:%H:%M:%S))
idate itime
1: 09:00:00
> IDateTime(01APR2008:09:00:00,%d%b%Y:%H:%M:%S)
charToDate(x)中的错误:
字符串不在一个标准的无歧义格式
看起来像我不能做 DT [,newType: = IDateTime(strptime(oldType,%d%b%Y:%H:%M:%S))]
。
我的问题是:
- 有没有办法直接投射到
IDateTime
fread
,以便我可以有效地排序? - 如果没有,什么是最有效的方式知道我想
- Is there a way to cast directly to
IDateTime
fromfread
, such that I can sort afterward efficiently? - If not, what is the most efficient way to go knowing that I would like to be able to sort DT by this datetime column
不幸的是(为了提高效率) strptime
生成一个POSIXlt类型,不支持 data.table
,并且总是由于它的大小(每个日期40字节! )和结构。虽然 strftime
产生更好的POSIXct,但它仍然通过POSIXlt。更多信息:
http://stackoverflow.com/a/12788992/403310
查看 as.Date
的基函数,它使用 strptime
,创建一个整数偏移量从epoch(奇数)存储为double。 data.table
中的 IDate
(和friends)类旨在实现存储为,um,integer的整数历元偏移。适合于快速排序 base :: sort.list(method =radix)
(这是一个真正的计数排序)。 IDate
实际上并不旨在快速(通常是一次性)转换。
/次,正确或错误,我倾向于自己的帮助函数。
如果字符串日期2012-12-24 / code>我会倾向于:
整数日期。类似地,时间可以 as.integer(gsub( - ,,col))
YYYYMMDD HHMMDD
作为整数。如果您通常希望 roll = TRUE,则分别使用
在一天内,但不是前一天。按月份分组简单快速: date
和 time
by = date%/%100L
。添加和减去天很麻烦,但它无论如何,因为很少你想添加日历天,而不是工作日或工作日。
在这种情况下,字符月需要转换为 1:12
。在您的日期01APR2008中没有分隔符,因此 substring
将是一个跟随一个匹配
或 fmatch
。你在控制文件格式?如果是这样,数字以更明确的格式更好,例如%Y-%m-%d
或%Y%m%d c>。
我还没有在 fread
所以日期/时间目前为字符,因为我还不知道如何检测日期格式或输出哪种类型。它需要输出整数或双精度日期,而不是低效字符。我怀疑我使用 YYYYMMDD
整数被视为非常规的,所以我有点犹豫,使它的默认。他们有自己的位置,也有基于时代的日期的利弊。
你觉得怎么样?
Btw,感谢您的鼓励 fread
;很高兴见到。 I am using the new version of data.table
and especially the AWESOME fread
function. My files contain dates that are loaded as strings (cause I don't know to do it otherwise) looking like 01APR2008:09:00:00
.
I need to sort the data.table on those datetimes and then for the sort to be efficient to cast then in the IDateTime
format (or anything alse I would not know yet).
> strptime("01APR2008:09:00:00","%d%b%Y:%H:%M:%S")
[1] "2008-04-01 09:00:00"
> IDateTime(strptime("01APR2008:09:00:00","%d%b%Y:%H:%M:%S"))
idate itime
1: 2008-04-01 09:00:00
> IDateTime("01APR2008:09:00:00","%d%b%Y:%H:%M:%S")
Error in charToDate(x) :
character string is not in a standard unambiguous format
It looks like I cannot do DT[ , newType := IDateTime(strptime(oldType, "%d%b%Y:%H:%M:%S"))]
.
My questions are then:
Unfortunately (for efficiency) strptime
produces a POSIXlt type, which is unsupported by data.table
and always will be due its size (40 bytes per date!) and structure. Although strftime
produces the much better POSIXct, it still does it via POSIXlt. More info here :
http://stackoverflow.com/a/12788992/403310
Looking to base functions such as as.Date
, it uses strptime
too, creating an integer offset from epoch (oddly) stored as double. The IDate
(and friends) class in data.table
aims to achieve integer epoch offsets stored as, um, integer. Suitable for fast sorting by base::sort.list(method = "radix")
(which is really a counting sort). IDate
doesn't really aim to be fast at (usually one off) conversion.
So to convert string dates/times, rightly or wrongly, I tend to roll my own helper function.
If the string date is "2012-12-24"
I'd lean towards: as.integer(gsub("-", "", col))
and proceed with YYYYMMDD
integer dates. Similarly times can be HHMMDD
as an integer. Two columns: date
and time
separately can be useful if you generally want to roll = TRUE
within a day, but not to the previous day. Grouping by month is simple and fast: by = date %/% 100L
. Adding and subtracting days is troublesome, but it is anyway because rarely do you want to add calendar days, rather weekdays or business days. So that's a lookup to your business day vector anyway.
In your case the character month would need a conversion to 1:12
. There isn't a separator in your dates "01APR2008", so a substring
would be one way followed by a match
or fmatch
on the month name. Are you in control of the file format? If so, numbers are better in an unambiguous format that sorts naturally such as %Y-%m-%d
, or %Y%m%d
.
I haven't yet got to how best do this in fread
, so date/times are left as character currently because I'm not yet sure how to detect the date format or which type to output. It does need to output either integer or double dates though, rather than inefficient character. I suspect that my use of YYYYMMDD
integers are seen as unconventional, so I'm a little hesitant to make that the default. They have their place, and there are pros and cons of epoch based dates too. Dates don't have to be always epoch based is all I'm suggesting.
What do you think? Btw, thanks for encouragement on fread
; was nice to see.
这篇关于直接将字符串转换为IDateTime的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!