使用 data.table 加速 rollapply [英] using data.table to speed up rollapply

查看:24
本文介绍了使用 data.table 加速 rollapply的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是 data.tables 的新手,如果这是一个非常基本的问题,我深表歉意.

I am new to data.tables so apologies if this is a very basic question.

我听说 data.tables 在处理大量数据时显着缩短了计算时间,因此想看看 data.table 是否能够帮助加速 rollapply 函数.

I have heard that data.tables significantly improves computational times when working with large amounts data, and so would like to see if data.table is able to help in speeding up the rollapply function.

如果我们有一些单变量数据

if we have some univariate data

xts.obj <- xts(rnorm(1e6), order.by=as.POSIXct(Sys.time()-1e6:1), tz="GMT") 
colnames(xts.obj) <- "rtns" 

一个宽度为 100 且 p 为 0.75 的简单滚动分位数需要花费惊人的时间......

a simple rolling quantile with width of 100 and a p of 0.75 takes a surprisingly long time...

即代码行

xts.obj$quant.75 <- rollapply(xts.obj$rtns,width=100, FUN='quantile', p=0.75) 

似乎需要永远......

seems to take forever...

data.table 有什么可以加快速度的吗?即是否有可以应用的通用滚动功能?

is there anything that data.table can do to speed things up? i.e. is there a generic roll function that can be applied?

也许是一个将 xts 对象转换为 data.table 对象的例程,以加速执行该功能,然后在最后重新转换回 xts?

perhaps a routine to convert an xts object to a data.table object to carry out the function in a speeded up manner and then reconvert back to xts at the end?

提前致谢

附言我在 data.table 邮件列表上似乎没有得到太多回复,所以我在这里发帖,看看我是否得到了更好的回复.

p.s. I didn't seem to be getting much of a response on the data.table mailing list so am posting up here, to see if I get a better response.

p.p.s 快速了解另一个使用数据帧的示例,data.table 解决方案似乎比 rollapply 函数花费的时间更长,即如下所示:

p.p.s having a quick go with another example using dataframes the data.table solution seems to take longer than the rollapply function, i.e. shown below:

> x <- data.frame(x=rnorm(10000))
> x.dt <- data.table(x)
> system.time(l1 <- as.numeric(rollapply(x,width=10,FUN=quantile,p=0.75)))   
   user  system elapsed 
   2.69    0.00    2.68 
> system.time(l <- as.numeric(unlist(x.dt[,lapply(1:((nrow(x.dt))-10+1), function(i){ x.dt[i:(i+10-1),quantile(x,p=0.75)]})])))
   user  system elapsed 
  11.22    0.00   11.51 
> identical(l,l1)
[1] TRUE

推荐答案

datatable 在这里无关紧要 - 您实际上是在向量上运行 sapply,这几乎是您可以获得的最快操作(除了去C).数据框和数据表总是比向量慢.您可以通过使用直线向量(没有 xts 调度)获得一点收益,但快速完成此操作的唯一简单方法是并行化:

datatable is quite irrelevant here - you're essentially running sapply on a vector, that is pretty much the fastest operation you can get (other than going to C). data frames and data tables will always be slower than vectors. You can gain a bit by using a straight vector (without xts dispatch), but the only easy way to get this done quickly is to parallelize:

> x = as.vector(xts.obj$rtns)
> system.time(unclass(mclapply(1:(length(x) - 99),
                      function(i) quantile(x[i:(i + 99)], p=0.75), mc.cores=32)))
   user  system elapsed 
325.481  15.533  11.221 

如果您需要更快,那么您可能需要编写一个专门的函数:天真的应用方法重新排序每个块,这显然是浪费的 - 您需要做的就是删除一个元素并在下一个元素中排序一个来获得分位数,所以如果你这样做,你可以期待大约 50 倍的加速 - 但你必须自己编码(所以只有当你更频繁地使用它时才值得......).

If you need that even faster, then you may want to write a specialized function: the naive apply approach re-sorts every chunk which is obviously wasteful - all you need to do is to drop the one element and sort in the next one to obtain the quantile so you can expect roughly 50x speedup if you do that - but you'll have to code that yourself (so it's only worth if you use it more often ...).

这篇关于使用 data.table 加速 rollapply的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆