将date用于data.table包 [英] Using Dates with the data.table package

查看:212
本文介绍了将date用于data.table包的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我最近发现了data.table包,现在想知道是否应该替换我的一些plyr代码。总之,我真的很喜欢plyr,我基本上实现了我想要的一切。但是,我的代码运行一段时间,加速的事情的前景足以让我运行一些测试。这些测试很快就结束了,这就是原因。



我经常使用plyr做的是用包含日期的列分割我的数据,并做一些计算: p>

  library(plyr)
DF < - data.frame(Date = rep(c(Sys.time Sys.time()+ 60),每个= 6),y = c(rnorm(6,1),rnorm(6,-1)))
#分割数据并应用任意函数
ddply(DF,。(Date),function(df){mean(df $ y) - df [nrow(df),y]})
  library(data.table)
DT < - data.table(Date = rep(c(Sys.time(),Sys.time()+ 60),each = 6) ,y = c(rnorm(6,1),rnorm(6,-1)))
setkey(DT,Date)
setkey将自动转换为整数而不会丢失信息。

如果我正确理解包,我只有获得大幅加速,当我使用setkey另外,我认为不会很好的编码,不断在日期和数字之间转换。所以我缺少一些东西,或者有没有简单的方法来实现data.table?

  sessionInfo()
R版本2.13.1(2011-07-08)
平台:x86_64-pc-mingw32 / x64(64位)

语言环境:
[1] C

附加的基本包:
[1] grid stats graphics grDevices utils数据集方法base

其他附加包:
[1] data.table_1.6.3 zoo_1.7-2 lubridate_0.2.5 ggplot2_0.8.9 proto_0.3-9.2 reshape_0.8.4
[7] reshape2_1.1 xtable_1.5-6 plyr_1.5.2

通过命名空间加载并未附加):
[1] digest_0.5.0 lattice_0.19-30 stringr_0.5 tools_2.13.1


解决方案

这应该可以工作:

  DT < (日期= as.ITime(rep(c(Sys.time(),Sys.time()+ 60),each = 6)),
y = c(rnorm(6,1),rnorm -1)))
setkey(DT,Date)



<一些日期/时间类与整数存储模式。
查看?IDateTime


带整数存储的日期和时间类用于快速排序和
分组。尚未实验!





  • IDate 类派生自 Date 。它与 Date 类具有相同的内部表示,除了存储模式是整数。

  • ITime 是以一天中的整数秒为单位存储的时间类。 as.ITime 不允许超过24小时的天数。因为 ITime 以秒为单位存储,您可以将其添加到 POSIXct 对象,但不应将其添加到 Date 对象。

  • IDateTime 需要一个日期时间输入,具有列日期时间的数据表。


I recently discovered the data.table package and was now wondering whether or not I should replace some of my plyr-code. To summarize, I really like plyr and I basically achieved everything I wanted. However, my code runs a while and the outlook of speeding things up was enough for me to run some tests. Those tests ended quite soon and here is the reason.

What I do quite often with plyr is to split my data by a column containing dates and do some calculations:

library(plyr)
DF <-  data.frame(Date=rep(c(Sys.time(), Sys.time() + 60), each=6), y=c(rnorm(6, 1), rnorm(6, -1)))
#Split up data and apply arbitrary function
ddply(DF, .(Date), function(df){mean(df$y) - df[nrow(df), "y"]})

However, using a column with the Date-format does not seem to work in data.table:

library(data.table)
DT <- data.table(Date=rep(c(Sys.time(), Sys.time() + 60), each=6), y=c(rnorm(6, 1), rnorm(6, -1)))
setkey(DT, Date)
#Error in setkey(DT, Date) : Column 'Date' cannot be auto converted to integer without losing information.

If I understand the package correctly, I only get substantial speed-ups when I use setkey(). Also, I think it wouldn't be good coding to constantly convert between Date and numeric. So am I missing something or is there just no easy way to achieve that with data.table?

sessionInfo()
R version 2.13.1 (2011-07-08)
Platform: x86_64-pc-mingw32/x64 (64-bit)

locale:
[1] C

attached base packages:
[1] grid      stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.6.3 zoo_1.7-2        lubridate_0.2.5  ggplot2_0.8.9    proto_0.3-9.2    reshape_0.8.4   
[7] reshape2_1.1     xtable_1.5-6     plyr_1.5.2      

loaded via a namespace (and not attached):
[1] digest_0.5.0    lattice_0.19-30 stringr_0.5     tools_2.13.1 

解决方案

This should work:

DT <- data.table(Date=as.ITime(rep(c(Sys.time(), Sys.time() + 60), each=6)),
                 y=c(rnorm(6, 1), rnorm(6, -1)))
setkey(DT, Date)

The data.table package contains some date/time classes with integer storage mode. See ?IDateTime:

Date and time classes with integer storage for fast sorting and grouping. Still experimental!

  • IDate is a date class derived from Date. It has the same internal representation as the Date class, except the storage mode is integer.
  • ITime is a time-of-day class stored as the integer number of seconds in the day. as.ITime does not allow days longer than 24 hours. Because ITime is stored in seconds, you can add it to a POSIXct object, but you should not add it to a Date object.
  • IDateTime takes a date-time input and returns a data table with columns date and time.

这篇关于将date用于data.table包的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆