将date用于data.table包 [英] Using Dates with the data.table package
问题描述
我最近发现了data.table包,现在想知道是否应该替换我的一些plyr代码。总之,我真的很喜欢plyr,我基本上实现了我想要的一切。但是,我的代码运行一段时间,加速的事情的前景足以让我运行一些测试。这些测试很快就结束了,这就是原因。
我经常使用plyr做的是用包含日期的列分割我的数据,并做一些计算: p>
library(plyr)
$ p $但是,使用带有Date-format的列在data.table中似乎不起作用:
DF < - data.frame(Date = rep(c(Sys.time Sys.time()+ 60),每个= 6),y = c(rnorm(6,1),rnorm(6,-1)))
#分割数据并应用任意函数
ddply(DF,。(Date),function(df){mean(df $ y) - df [nrow(df),y]})
library(data.table)
DT < - data.table(Date = rep(c(Sys.time(),Sys.time()+ 60),each = 6) ,y = c(rnorm(6,1),rnorm(6,-1)))
setkey(DT,Date)
setkey将自动转换为整数而不会丢失信息。
如果我正确理解包,我只有获得大幅加速,当我使用setkey另外,我认为不会很好的编码,不断在日期和数字之间转换。所以我缺少一些东西,或者有没有简单的方法来实现data.table?
sessionInfo()
R版本2.13.1(2011-07-08)
平台:x86_64-pc-mingw32 / x64(64位)
语言环境:
[1] C
附加的基本包:
[1] grid stats graphics grDevices utils数据集方法base
其他附加包:
[1] data.table_1.6.3 zoo_1.7-2 lubridate_0.2.5 ggplot2_0.8.9 proto_0.3-9.2 reshape_0.8.4
[7] reshape2_1.1 xtable_1.5-6 plyr_1.5.2
通过命名空间加载并未附加):
[1] digest_0.5.0 lattice_0.19-30 stringr_0.5 tools_2.13.1
解决方案这应该可以工作:
DT < (日期= as.ITime(rep(c(Sys.time(),Sys.time()+ 60),each = 6)),
y = c(rnorm(6,1),rnorm -1)))
setkey(DT,Date)
<一些日期/时间类与整数存储模式。
查看?IDateTime
:
带整数存储的日期和时间类用于快速排序和
分组。尚未实验!
-
IDate
类派生自Date
。它与Date
类具有相同的内部表示,除了存储模式是整数。 -
ITime
是以一天中的整数秒为单位存储的时间类。as.ITime
不允许超过24小时的天数。因为ITime
以秒为单位存储,您可以将其添加到POSIXct
对象,但不应将其添加到Date
对象。 -
IDateTime
需要一个日期时间输入,具有列日期
和时间
的数据表。
I recently discovered the data.table package and was now wondering whether or not I should replace some of my plyr-code. To summarize, I really like plyr and I basically achieved everything I wanted. However, my code runs a while and the outlook of speeding things up was enough for me to run some tests. Those tests ended quite soon and here is the reason.
What I do quite often with plyr is to split my data by a column containing dates and do some calculations:
library(plyr)
DF <- data.frame(Date=rep(c(Sys.time(), Sys.time() + 60), each=6), y=c(rnorm(6, 1), rnorm(6, -1)))
#Split up data and apply arbitrary function
ddply(DF, .(Date), function(df){mean(df$y) - df[nrow(df), "y"]})
However, using a column with the Date-format does not seem to work in data.table:
library(data.table)
DT <- data.table(Date=rep(c(Sys.time(), Sys.time() + 60), each=6), y=c(rnorm(6, 1), rnorm(6, -1)))
setkey(DT, Date)
#Error in setkey(DT, Date) : Column 'Date' cannot be auto converted to integer without losing information.
If I understand the package correctly, I only get substantial speed-ups when I use setkey(). Also, I think it wouldn't be good coding to constantly convert between Date and numeric. So am I missing something or is there just no easy way to achieve that with data.table?
sessionInfo()
R version 2.13.1 (2011-07-08)
Platform: x86_64-pc-mingw32/x64 (64-bit)
locale:
[1] C
attached base packages:
[1] grid stats graphics grDevices utils datasets methods base
other attached packages:
[1] data.table_1.6.3 zoo_1.7-2 lubridate_0.2.5 ggplot2_0.8.9 proto_0.3-9.2 reshape_0.8.4
[7] reshape2_1.1 xtable_1.5-6 plyr_1.5.2
loaded via a namespace (and not attached):
[1] digest_0.5.0 lattice_0.19-30 stringr_0.5 tools_2.13.1
This should work:
DT <- data.table(Date=as.ITime(rep(c(Sys.time(), Sys.time() + 60), each=6)),
y=c(rnorm(6, 1), rnorm(6, -1)))
setkey(DT, Date)
The data.table package contains some date/time classes with integer storage mode.
See ?IDateTime
:
Date and time classes with integer storage for fast sorting and grouping. Still experimental!
IDate
is a date class derived fromDate
. It has the same internal representation as theDate
class, except the storage mode is integer.ITime
is a time-of-day class stored as the integer number of seconds in the day.as.ITime
does not allow days longer than 24 hours. BecauseITime
is stored in seconds, you can add it to aPOSIXct
object, but you should not add it to aDate
object.IDateTime
takes a date-time input and returns a data table with columnsdate
andtime
.
这篇关于将date用于data.table包的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!