将字符串直接转换为 IDateTime [英] cast string directly to IDateTime

查看:15
本文介绍了将字符串直接转换为 IDateTime的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用新版本的 data.table,尤其是 AWESOME fread 函数.我的文件包含作为字符串加载的日期(因为我不知道该怎么做)看起来像 01APR2008:09:00:00.

I am using the new version of data.table and especially the AWESOME fread function. My files contain dates that are loaded as strings (cause I don't know to do it otherwise) looking like 01APR2008:09:00:00.

我需要根据这些日期时间对 data.table 进行排序,然后以 IDateTime 格式(或其他我还不知道的格式)进行有效排序.

I need to sort the data.table on those datetimes and then for the sort to be efficient to cast then in the IDateTime format (or anything alse I would not know yet).

> strptime("01APR2008:09:00:00","%d%b%Y:%H:%M:%S")
[1] "2008-04-01 09:00:00"

> IDateTime(strptime("01APR2008:09:00:00","%d%b%Y:%H:%M:%S"))
        idate    itime
1: 2008-04-01 09:00:00

> IDateTime("01APR2008:09:00:00","%d%b%Y:%H:%M:%S")
Error in charToDate(x) : 
character string is not in a standard unambiguous format 

看来我做不到 DT[ , newType := IDateTime(strptime(oldType, "%d%b%Y:%H:%M:%S"))].

那么我的问题是:

  1. 有没有办法从 fread 直接转换为 IDateTime,这样我以后可以高效地排序?
  2. 如果没有,知道我希望能够按此日期时间列对 DT 进行排序的最有效方法是什么
  1. Is there a way to cast directly to IDateTime from fread, such that I can sort afterward efficiently?
  2. If not, what is the most efficient way to go knowing that I would like to be able to sort DT by this datetime column

推荐答案

不幸的是(为了提高效率)strptime 产生一个 POSIXlt 类型,data.table 不支持该类型,并且总是由于它的大小(每个日期 40 个字节!)和结构.尽管 strftime 产生了更好的 POSIXct,但它仍然通过 POSIXlt 来实现.更多信息在这里:

Unfortunately (for efficiency) strptime produces a POSIXlt type, which is unsupported by data.table and always will be due its size (40 bytes per date!) and structure. Although strftime produces the much better POSIXct, it still does it via POSIXlt. More info here :

http://stackoverflow.com/a/12788992/403310

寻找诸如 as.Date 之类的基本函数,它也使用 strptime,从存储为 double 的 epoch 创建一个整数偏移量(奇怪的是).data.table 中的 IDate (和朋友)类旨在实现整数时代偏移存储为,嗯,整数.适用于 base::sort.list(method = "radix") 的快速排序(实际上是计数排序).IDate 的目标并不是快速(通常是一次性)转换.

Looking to base functions such as as.Date, it uses strptime too, creating an integer offset from epoch (oddly) stored as double. The IDate (and friends) class in data.table aims to achieve integer epoch offsets stored as, um, integer. Suitable for fast sorting by base::sort.list(method = "radix") (which is really a counting sort). IDate doesn't really aim to be fast at (usually one off) conversion.

因此,为了正确或错误地转换字符串日期/时间,我倾向于使用自己的辅助函数.

So to convert string dates/times, rightly or wrongly, I tend to roll my own helper function.

如果字符串日期是 "2012-12-24" 我倾向于:as.integer(gsub("-", "", col)) 并继续YYYYMMDD 整数日期.类似地,时间可以是 HHMMDD 作为整数.两列:datetime 如果您通常希望在一天内而不是前一天进行 roll = TRUE,则分别会很有用.按月分组简单快捷:by = date %/% 100L.添加和减去天数很麻烦,但无论如何您都不想添加日历日,而是添加工作日或工作日.所以无论如何,这都是对您的工作日向量的查找.

If the string date is "2012-12-24" I'd lean towards: as.integer(gsub("-", "", col)) and proceed with YYYYMMDD integer dates. Similarly times can be HHMMDD as an integer. Two columns: date and time separately can be useful if you generally want to roll = TRUE within a day, but not to the previous day. Grouping by month is simple and fast: by = date %/% 100L. Adding and subtracting days is troublesome, but it is anyway because rarely do you want to add calendar days, rather weekdays or business days. So that's a lookup to your business day vector anyway.

在您的情况下,字符月份需要转换为 1:12.您的日期01APR2008"中没有分隔符,因此 substring 将是一种方式,后跟一个月的 matchfmatch名称.您可以控制文件格式吗?如果是这样,数字最好采用自然排序的明确格式,例如 %Y-%m-%d%Y%m%d.

In your case the character month would need a conversion to 1:12. There isn't a separator in your dates "01APR2008", so a substring would be one way followed by a match or fmatch on the month name. Are you in control of the file format? If so, numbers are better in an unambiguous format that sorts naturally such as %Y-%m-%d, or %Y%m%d.

我还没有知道如何在 fread 中最好地做到这一点,因此日期/时间目前保留为字符,因为我还不确定如何检测日期格式或要使用哪种类型输出.它确实需要输出整数或双日期,而不是低效的字符.我怀疑我对 YYYYMMDD 整数的使用被视为非常规,所以我有点犹豫是否将其设为默认值.它们有自己的位置,并且基于纪元的日期也有利弊.我建议的只是日期必须始终基于纪元.

I haven't yet got to how best do this in fread, so date/times are left as character currently because I'm not yet sure how to detect the date format or which type to output. It does need to output either integer or double dates though, rather than inefficient character. I suspect that my use of YYYYMMDD integers are seen as unconventional, so I'm a little hesitant to make that the default. They have their place, and there are pros and cons of epoch based dates too. Dates don't have to be always epoch based is all I'm suggesting.

你怎么看?顺便说一句,感谢您对 fread 的鼓励;很高兴看到.

What do you think? Btw, thanks for encouragement on fread; was nice to see.

这篇关于将字符串直接转换为 IDateTime的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆