直接将字符串转换为IDateTime [英] cast string directly to IDateTime

查看:138
本文介绍了直接将字符串转换为IDateTime的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用新版本的 data.table ,尤其是AWESOME fread 函数。我的文件包含作为字符串加载的日期(因为我不知道这样做)否则看起来像 01APR2008:09:00:00



我需要在这些数据时间对data.table进行排序,然后在 IDateTime 格式(或任何我还不知道的事情)。

 > strptime(01APR2008:09:00:00,%d%b%Y:%H:%M:%S)
[1]2008-04-01 09:00:00

> IDateTime(strptime(01APR2008:09:00:00,%d%b%Y:%H:%M:%S))
idate itime
1: 09:00:00

> IDateTime(01APR2008:09:00:00,%d%b%Y:%H:%M:%S)
charToDate(x)中的错误:
字符串不在一个标准的无歧义格式



看起来像我不能做 DT [,newType: = IDateTime(strptime(oldType,%d%b%Y:%H:%M:%S))]



我的问题是:


  1. 有没有办法直接投射到 IDateTime fread ,以便我可以有效地排序?

  2. 如果没有,什么是最有效的方式知道我想

  3. 解决方案

    不幸的是(为了提高效率) strptime 生成一个POSIXlt类型,不支持 data.table ,并且总是由于它的大小(每个日期40字节! )和结构。虽然 strftime 产生更好的POSIXct,但它仍然通过POSIXlt。更多信息:

      http://stackoverflow.com/a/12788992/403310 

    查看 as.Date 的基函数,它使用 strptime ,创建一个整数偏移量从epoch(奇数)存储为double。 data.table 中的 IDate (和friends)类旨在实现存储为,um,integer的整数历元偏移。适合于快速排序 base :: sort.list(method =radix)(这是一个真正的计数排序)。 IDate 实际上并不旨在快速(通常是一次性)转换。



    /次,正确或错误,我倾向于自己的帮助函数。



    如果字符串日期2012-12-24 / code>我会倾向于: as.integer(gsub( - ,,col)) YYYYMMDD 整数日期。类似地,时间可以 HHMMDD 作为整数。如果您通常希望 roll = TRUE,则分别使用 date time 在一天内,但不是前一天。按月份分组简单快速: by = date%/%100L 。添加和减去天很麻烦,但它无论如何,因为很少你想添加日历天,而不是工作日或工作日。



    在这种情况下,字符月需要转换为 1:12 。在您的日期01APR2008中没有分隔符,因此 substring 将是一个跟随一个匹配 fmatch 。你在控制文件格式?如果是这样,数字以更明确的格式更好,例如%Y-%m-%d %Y%m%d c>。



    我还没有在 fread 所以日期/时间目前为字符,因为我还不知道如何检测日期格式或输出哪种类型。它需要输出整数或双精度日期,而不是低效字符。我怀疑我使用 YYYYMMDD 整数被视为非常规的,所以我有点犹豫,使它的默认。他们有自己的位置,也有基于时代的日期的利弊。



    你觉得怎么样?

    Btw,感谢您的鼓励 fread ;很高兴见到。


    I am using the new version of data.table and especially the AWESOME fread function. My files contain dates that are loaded as strings (cause I don't know to do it otherwise) looking like 01APR2008:09:00:00.

    I need to sort the data.table on those datetimes and then for the sort to be efficient to cast then in the IDateTime format (or anything alse I would not know yet).

    > strptime("01APR2008:09:00:00","%d%b%Y:%H:%M:%S")
    [1] "2008-04-01 09:00:00"
    
    > IDateTime(strptime("01APR2008:09:00:00","%d%b%Y:%H:%M:%S"))
            idate    itime
    1: 2008-04-01 09:00:00
    
    > IDateTime("01APR2008:09:00:00","%d%b%Y:%H:%M:%S")
    Error in charToDate(x) : 
    character string is not in a standard unambiguous format 
    

    It looks like I cannot do DT[ , newType := IDateTime(strptime(oldType, "%d%b%Y:%H:%M:%S"))].

    My questions are then:

    1. Is there a way to cast directly to IDateTime from fread, such that I can sort afterward efficiently?
    2. If not, what is the most efficient way to go knowing that I would like to be able to sort DT by this datetime column

    解决方案

    Unfortunately (for efficiency) strptime produces a POSIXlt type, which is unsupported by data.table and always will be due its size (40 bytes per date!) and structure. Although strftime produces the much better POSIXct, it still does it via POSIXlt. More info here :

    http://stackoverflow.com/a/12788992/403310
    

    Looking to base functions such as as.Date, it uses strptime too, creating an integer offset from epoch (oddly) stored as double. The IDate (and friends) class in data.table aims to achieve integer epoch offsets stored as, um, integer. Suitable for fast sorting by base::sort.list(method = "radix") (which is really a counting sort). IDate doesn't really aim to be fast at (usually one off) conversion.

    So to convert string dates/times, rightly or wrongly, I tend to roll my own helper function.

    If the string date is "2012-12-24" I'd lean towards: as.integer(gsub("-", "", col)) and proceed with YYYYMMDD integer dates. Similarly times can be HHMMDD as an integer. Two columns: date and time separately can be useful if you generally want to roll = TRUE within a day, but not to the previous day. Grouping by month is simple and fast: by = date %/% 100L. Adding and subtracting days is troublesome, but it is anyway because rarely do you want to add calendar days, rather weekdays or business days. So that's a lookup to your business day vector anyway.

    In your case the character month would need a conversion to 1:12. There isn't a separator in your dates "01APR2008", so a substring would be one way followed by a match or fmatch on the month name. Are you in control of the file format? If so, numbers are better in an unambiguous format that sorts naturally such as %Y-%m-%d, or %Y%m%d.

    I haven't yet got to how best do this in fread, so date/times are left as character currently because I'm not yet sure how to detect the date format or which type to output. It does need to output either integer or double dates though, rather than inefficient character. I suspect that my use of YYYYMMDD integers are seen as unconventional, so I'm a little hesitant to make that the default. They have their place, and there are pros and cons of epoch based dates too. Dates don't have to be always epoch based is all I'm suggesting.

    What do you think? Btw, thanks for encouragement on fread; was nice to see.

    这篇关于直接将字符串转换为IDateTime的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆