使用data.table加快rollapply [英] using data.table to speed up rollapply

查看:166
本文介绍了使用data.table加快rollapply的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是新的data.tables如此抱歉,如果这是一个非常基本的问题。



我听说data.tables在处理大量数据时显着提高了计算时间,所以想看看data.table是否能够帮助加快滚动功能。



如果我们有一些单变量数据

  xts.obj< -  xts(rnorm(1e6),order.by = as.POSIXct(Sys.time() -  1e6:1),tz =GMT)
colnames(xts.obj)< - rtns

一个简单的滚动分位数,宽度为100,ap为0.75需要非常长的时间...

ie代码行

  xts.obj $ quant.75< rollapply(xts.obj $ rtns,width = FUN ='quantile',p = 0.75)

似乎永远...



有什么东西,data.table可以做什么来加快事情吗?即是否有可应用的通用滚动功能?



也许是一个将xts对象转换为data.table对象以便以加速方式执行该函数然后在结束时重新转换回xts的例程?



提前感谢



hlm



我似乎没有得到很多的反应在data.table邮件列表,所以在这里发布,看看我是否得到更好的反应。



pps使用dataframes的另一个例子使用data.table解决方案似乎需要比rollapply函数更长的时间,如下所示:

 > x < -  data.frame(x = rnorm(10000))
> x.dt< - data.table(x)
> system.time(l1 < - as.numeric(rollapply(x,width = 10,FUN = quantile,p = 0.75)))
用户系统已经过
2.69 0.00 2.68
> system.time(1 < - as.numeric(unlist(x.dt [,lapply(1:((nrow(x.dt)) - 10 + 1),function(i){x.dt [ i + 10-1),quantile(x,p = 0.75)]})])))
用户系统已过
11.22 0.00 11.51
> same(l,l1)
[1] TRUE


解决方案

datatable在这里是不相关的 - 你基本上在向量上运行 sapply ,这是几乎是你能得到的最快的操作(除去C)。数据帧和数据表将总是慢于向量。你可以通过使用一个直的向量(没有xts dispatch)来获得一点,但是快速完成这个工作的唯一方法是并行化:

 > x = as.vector(xts.obj $ rtns)
> system.time(unslass(mclapply(1:(length(x) - 99),
function(i)quantile(x [i:(i + 99)],p = 0.75),mc.cores = 32 )))
用户系统已过
325.481 15.533 11.221

甚至更快,那么你可能想写一个专门的函数:naive应用方法重新排序每个块,这显然是浪费 - 所有你需要做的是放下一个元素和排序在下一个获得分位数,所以你可以期望大约50倍加速,如果你这样做 - 但你必须自己编码(所以它是唯一值得的,如果你更经常使用...)。


I am new to data.tables so apologies if this is a very basic question.

I have heard that data.tables significantly improves computational times when working with large amounts data, and so would like to see if data.table is able to help in speeding up the rollapply function.

if we have some univariate data

xts.obj <- xts(rnorm(1e6), order.by=as.POSIXct(Sys.time()-1e6:1), tz="GMT") 
colnames(xts.obj) <- "rtns" 

a simple rolling quantile with width of 100 and a p of 0.75 takes a surprisingly long time...

i.e. the line of code

xts.obj$quant.75 <- rollapply(xts.obj$rtns,width=100, FUN='quantile', p=0.75) 

seems to take forever...

is there anything that data.table can do to speed things up? i.e. is there a generic roll function that can be applied?

perhaps a routine to convert an xts object to a data.table object to carry out the function in a speeded up manner and then reconvert back to xts at the end?

thanks in advance

hlm

p.s. I didn't seem to be getting much of a response on the data.table mailing list so am posting up here, to see if I get a better response.

p.p.s having a quick go with another example using dataframes the data.table solution seems to take longer than the rollapply function, i.e. shown below:

> x <- data.frame(x=rnorm(10000))
> x.dt <- data.table(x)
> system.time(l1 <- as.numeric(rollapply(x,width=10,FUN=quantile,p=0.75)))   
   user  system elapsed 
   2.69    0.00    2.68 
> system.time(l <- as.numeric(unlist(x.dt[,lapply(1:((nrow(x.dt))-10+1), function(i){ x.dt[i:(i+10-1),quantile(x,p=0.75)]})])))
   user  system elapsed 
  11.22    0.00   11.51 
> identical(l,l1)
[1] TRUE

解决方案

datatable is quite irrelevant here - you're essentially running sapply on a vector, that is pretty much the fastest operation you can get (other than going to C). data frames and data tables will always be slower than vectors. You can gain a bit by using a straight vector (without xts dispatch), but the only easy way to get this done quickly is to parallelize:

> x = as.vector(xts.obj$rtns)
> system.time(unclass(mclapply(1:(length(x) - 99),
                      function(i) quantile(x[i:(i + 99)], p=0.75), mc.cores=32)))
   user  system elapsed 
325.481  15.533  11.221 

If you need that even faster, then you may want to write a specialized function: the naive apply approach re-sorts every chunk which is obviously wasteful - all you need to do is to drop the one element and sort in the next one to obtain the quantile so you can expect roughly 50x speedup if you do that - but you'll have to code that yourself (so it's only worth if you use it more often ...).

这篇关于使用data.table加快rollapply的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆