R向量化数组数据操作 [英] R vectorized array data manipulation

查看:55
本文介绍了R向量化数组数据操作的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我认为将会有更多的人对此主题感兴趣.我有一些最有效的方式要做的特定任务.我的基本数据是:-买入和卖出信号的时间指数-根据时间指标,我可以得出最接近的买卖对之间的ROC(变化率):

I think there will be much more people interested into this subject. I have some specific task to do in the most efficient way. My base data are: - time indices of buy and sell signals - on the diag of time indicies I have ROC (rate of change) between closest buy-sell pairs:

r <- array(data = NA, 
           dim = c(5, 5), 
           dimnames = list(buy_idx = c(1,5,9,12,16), 
                           sell_idx = c(3,7,10,14,19)))
diag(r) <- c(1.04,0.97,1.07,1.21,1.1)

任务是在每个可能的窗口(买卖对)上生成移动复合ROC,以及当前我完成任务的方式:

The task is to generate moving compound ROC on every possible window (buy-sell pairs), and the way I'm solving my task currently:

for(i in 2:5){
  r[1:(i-1),i] <- r[1:(i-1),i-1] * r[i,i]
}

直到我没有将其循环到更高的位置,我的解决方案的时间是可以接受的.有没有办法将此循环更改为矢量化解决方案?是否有任何记录良好的教程来学习R中的向量化思维类型?-这将比一次性解决方案更有价值!

Until I'm not looping it somewhere upper, the time of my solution is very acceptable. Is there a way to change this loop to vectorized solution? Are there any good well documented tutorials to learn vectorized type of thinking in R? - it would be much more valuable than one time solution!

修改20130709:

与上一个任务/示例高度相关的下一个任务.在每笔交易中应用税值(以%值计的税).当前解决方案:

Next task highly related to previous task/example. Apply tax value on each transaction (tax in % values). Current solution:

diag(r[,]) <- diag(r[,]) * ((1-(tax/100))^2)
for(i in 2:dim(r)[2]){
  r[1:(i-1),i] <- r[1:(i-1),i] * ((1-(tax/100))^(2*(i:2)))
}

您知道更有效的方法吗?或更正确(如果不能解决所有问题).

Do you know any more efficient way? or more correct if this doesn't handle everything.

推荐答案

如果 d 是您的对角线元素,那么到处 j> = i r[i,j] prod(d [i:j]),也可以写成 prod(d [1:j])/prod(d [1:(i-1)]).因此,此技巧使用累积乘积的外部比率:

If d are your diagonal elements, then everywhere j >= i, r[i,j] is prod(d[i:j]), which can also be written prod(d[1:j]) / prod(d[1:(i-1)]). Hence this trick using the outer ratio of the cumulative product:

d <- c(1.04,0.97,1.07,1.21,1.1)
n <- length(d)
p <- cumprod(c(1, d))
r <- t(outer(p, 1/p, "*"))[-n-1, -1]
r[lower.tri(r)] <- NA


一些基准测试表明,对于某些(不是全部)输入大小,它比OP更好:


Some benchmarks showing that it does better than OP for some (not all) input sizes:

OP <- function(d) {
   r <- diag(d)
   for(i in 2:length(d)){
     r[1:(i-1),i] <- r[1:(i-1),i-1] * r[i,i]
   }
   r
}

flodel <- function(d) {
   n <- length(d)
   p <- cumprod(c(1, d))
   r <- t(outer(p, 1/p, "*"))[-n-1, -1]
   r[lower.tri(r)] <- NA
   r
}

d <- runif(10)
microbenchmark(OP(d), flodel(d))
# Unit: microseconds
#        expr     min       lq   median      uq     max
# 1 flodel(d)  83.028  85.6135  88.4575  90.153 144.111
# 2     OP(d) 115.993 122.0075 123.4730 126.826 206.892

d <- runif(100)
microbenchmark(OP(d), flodel(d))
# Unit: microseconds
#        expr      min       lq    median       uq      max
# 1 flodel(d)  490.819  545.528  549.6095  566.108  684.043
# 2     OP(d) 1227.235 1260.823 1282.9880 1313.264 3913.322

d <- runif(1000)
microbenchmark(OP(d), flodel(d))
# Unit: milliseconds
#        expr      min        lq    median        uq       max
# 1 flodel(d) 97.78687 106.39425 121.13807 133.99502 154.67168
# 2     OP(d) 53.49014  60.10124  72.56427  85.17864  91.89011


编辑以回答20130709的新增内容:

我假设 tax 是一个标量,然后让 z<--(1- tax/100)^ 2 .您的最终结果是 r 乘以以不同次幂得出的 z 矩阵.您要避免的是一遍又一遍地计算这些能力.这就是我要做的:

I'll assume tax is a scalar and let z <- (1- tax/100)^2. Your final result is r multiplied by a matrix of z raised at different powers. What you want to avoid is compute these powers over and over. Here is what I would do:

pow <- 1L + col(r) - row(r)
pow[lower.tri(pow)] <- NA
tax.mult <- (z^(1:n))[pow]
r <- r * tax.mult

这篇关于R向量化数组数据操作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆