R数据表滑动窗口 [英] R data.table sliding window

查看:320
本文介绍了R数据表滑动窗口的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用data.table包实现滑动窗口函数的最好(最快)方法是什么?



我试图计算滚动中值,每个日期多行(由于2个额外的因素),我认为这意味着动物园rollapply函数不会工作。下面是一个使用naive for循环的例子:

  library(data.table)
df< - data。框架(
id = 30000,
date = rep(as.IDate(as.IDate(2012-01-01)+ 0:29,origin =1970-01-01), each = 1000),
factor1 = rep(1:5,each = 200),
factor2 = 1:5,
value = rnorm(30,100,10)


dt = data.table(df)
setkeyv(dt,c(date,factor1,factor2))

get_window ; - function(date,factor1,factor2){
criteria< - data.table(
date = as.IDate((date-7)::( date-1),origin =1970- 01-01),
factor1 = as.integer(factor1),
factor2 = as.integer(factor2)

return(dt [criteria] )
}

输出< - data.table(unique(dt [,list(date,factor1,factor2)]))[,window_median:= as.numeric(NA)

for(i in nrow(output):1){
print(i)
output [i,window_median:= median(get_window(date,factor1,factor2)) ]
}


解决方案

data.table 当前没有滚动窗口的任何特殊功能。在这里另一个类似问题的答案更详细这里:



滚动中位数很有趣。它需要一个专门的功能来有效地完成(与之前的注释相同的链接):





data.table 解决方案在这里的问题和答案都是非常低效的,相对于一个正确的专门 rollingmedian 函数(这是不可用的R afaik)。 / p>

What is the best (fastest) way to implement a sliding window function with the data.table package?

I'm trying to calculate a rolling median but have multiple rows per date (due to 2 additional factors), which I think means that the zoo rollapply function wouldn't work. Here is an example using a naive for loop:

library(data.table)
df <- data.frame(
  id=30000,
  date=rep(as.IDate(as.IDate("2012-01-01")+0:29, origin="1970-01-01"), each=1000),
  factor1=rep(1:5, each=200),
  factor2=1:5,
  value=rnorm(30, 100, 10)
)

dt = data.table(df)
setkeyv(dt, c("date", "factor1", "factor2"))

get_window <- function(date, factor1, factor2) {
  criteria <- data.table(
    date=as.IDate((date - 7):(date - 1), origin="1970-01-01"),
    factor1=as.integer(factor1),
    factor2=as.integer(factor2)
  )
  return(dt[criteria][, value])
}

output <- data.table(unique(dt[, list(date, factor1, factor2)]))[, window_median:=as.numeric(NA)]

for(i in nrow(output):1) {
  print(i)
  output[i, window_median:=median(get_window(date, factor1, factor2))]
}

解决方案

data.table doesn't have any special features for rolling windows, currently. Further detail here in my answer to another similar question here :

Is there a fast way to run a rolling regression inside data.table?

Rolling median is interesting. It would need a specialized function to do efficiently (same link as in earlier comment) :

Rolling median algorithm in C

The data.table solutions in the question and answers here are all very inefficient, relative to a proper specialized rollingmedian function (which isn't available for R afaik).

这篇关于R数据表滑动窗口的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆