需要具有开始到停止索引的更快滚动应用功能 [英] Need faster rolling apply function with start to stop indices

查看:71
本文介绍了需要具有开始到停止索引的更快滚动应用功能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

下面是一段代码.它提供了15分钟(历史)窗口滚动时的交易价格水平的百分位.如果长度为500或1000,它运行很快,但是您可以看到有45K观测值,而对于整个数据来说,它非常慢.我可以应用任何plyr功能吗?欢迎其他任何建议.

Below is the piece of code. It gives percentile of the trade price level for rolling 15-minute(historical) window. It runs quickly if the length is 500 or 1000, but as you can see there are 45K observations, and for the entire data its very slow. Can I apply any of the plyr functions? Any other suggestions are welcome.

这是贸易数据的样子:

> str(trade)
'data.frame':   45571 obs. of  5 variables:
 $ time    : chr  "2013-10-20 22:00:00.489" "2013-10-20 22:00:00.807" "2013-10-20 22:00:00.811" "2013-10-20 22:00:00.811" ...
 $ prc     : num  121 121 121 121 121 ...
 $ siz     : int  1 4 1 2 3 3 2 2 3 4 ...
 $ aggress : chr  "B" "B" "B" "B" ...
 $ time.pos: POSIXlt, format: "2013-10-20 22:00:00.489" "2013-10-20 22:00:00.807" "2013-10-20 22:00:00.811" "2013-10-20 22:00:00.811" ...

这是新列trade $ time.pos

And this is how the data looks like after the new column trade$time.pos

trade$time.pos <- strptime(trade$time, format="%Y-%m-%d %H:%M:%OS") 

> head(trade)
                     time      prc siz aggress                time.pos
1 2013-10-20 22:00:00.489 121.3672   1       B 2013-10-20 22:00:00.489
2 2013-10-20 22:00:00.807 121.3750   4       B 2013-10-20 22:00:00.807
3 2013-10-20 22:00:00.811 121.3750   1       B 2013-10-20 22:00:00.811
4 2013-10-20 22:00:00.811 121.3750   2       B 2013-10-20 22:00:00.811
5 2013-10-20 22:00:00.811 121.3750   3       B 2013-10-20 22:00:00.811
6 2013-10-20 22:00:00.811 121.3750   3       B 2013-10-20 22:00:00.811

#t_15_index function returns the indices of the trades that were executed in last 15 minutes from the current trade(t-15 to t).
t_15_index <- function(data_vector,index) {
  which(data_vector[index] - data_vector[1:index]<=15*60)
}

get_percentile <- function(data) {
  len_d <- dim(trade)[1]  

  price_percentile = vector(length=len_d)  

  for(i in 1: len_d) {   

    t_15 = t_15_index(trade$time.pos,i)
    #ecdf(rep(..)) gets the empirical distribution of the the trade size on a particular trade-price level
    price_dist = ecdf(rep(trade$prc[t_15],trade$siz[t_15]))
    #percentile of the current price level depending on current (t-15 to t) subset of data
    price_percentile[i] = price_dist(trade$prc[i])
  }
  trade$price_percentile = price_percentile
  trade
}


res_trade = get_percentile(trade)

推荐答案

在这里,我们可以快速找到15分钟前发生的时间的索引:

Here's a quick stab at more efficiently finding the index of the time that occurred fifteen minutes ago:

# Create some sample data (modified from BrodieG)
set.seed(1)

ticks <- 45000
start <- as.numeric(as.POSIXct("2013-01-01"))
end <- as.numeric(as.POSIXct("2013-01-02"))
times <- as.POSIXct(runif(ticks, start, end), origin=as.POSIXct("1970-01-01"))
trade <- data.frame(
  time = sort(times),
  prc = 100 + rnorm(ticks, 0, 5),
  siz = sample(1:10, ticks, rep = T)
)

# For vector of times, find the index of the first time that was at least
# fifteen minutes before the current time. Assumes times are sorted
minutes_ago <- function(time, minutes = 15) {
  secs <- minutes * 60
  time <- as.numeric(time)
  out <- integer(length(time))

  before <- 1

  for(i in seq_along(out)) {
    while(time[before] < time[i] - secs) {
      before <- before + 1
    }
    out[i] <- before

  }
  out
}
system.time(minutes_ago(trade$time))
# Takes about 0.2s on my machine

library(Rcpp)
cppFunction("IntegerVector minutes_ago2(NumericVector time, int minutes = 15) {
  int secs = minutes * 60;
  int n = time.size();
  IntegerVector out(n);

  int before = 0;
  for (int i = 0; i < n; ++i) {
    # Could do even better here with a binary search
    while(time[before] < time[i] - secs) {
      before++;
    }
    out[i] = before + 1;
  }
  return out;
}")

system.time(minutes_ago2(trade$time, 10))
# Takes less than < 0.001

all.equal(minutes_ago(trade$time, 15), minutes_ago2(trade$time, 15))

这篇关于需要具有开始到停止索引的更快滚动应用功能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆