如何使用Rcpp来加速for循环? [英] How to use Rcpp to speed up a for loop?

查看:164
本文介绍了如何使用Rcpp来加速for循环?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经创建了一个for循环,我想使用Rcpp库加速它。我不太熟悉C ++。你能帮我让我的功能更快吗?
感谢您的帮助!

我将算法,代码以及输入和输出都包含在sessionInfo中。



以下是我的算法:

如果当前价格高于之前的价格,如果当前价格低于前一个价格,那么名为TR的列中的标记(-1)为TR



如果当前价格与前一个价格相同,则
标记与TR列中前一个价格相同的东西b

这是我的代码:

  price <-c(71.91,71.82,71.81 ,71.81,71.81,71.82,71.81,71.81,71.81,
71.82,71.81,71.81,71.8,71.81,71.8,71.81,71.8,71.8,71.8,
71.8,71.81,71.81,71.81,71.81 71.81,71.81,71.81,71.81,71.81,71.81,71.81,71.81,71.81,71.81,71.81,71.81,71.81,71.81,71.81,71.81,71.81,71.81,71.81,71.81,71.81,71.81,71.81,71.81,71.8,$ 71.81,71.81,71.81,71.81,71.81,71.81,71.81, 71.81,71.82,71.82,71.81,71.81,
71.81,71.81,71.81,71.81,71.81,71.82,71.82,71.82,71.82,
71.82,71.82,71.82,71.81,71.82,71.82,71.82, 71.82,71.82,
71.82,71.82,71.82,71.82,71.81,71.81,71.81,71.82,71.82,
71.81,71.82,71.82,71.82,71.81,71.82,71.82,71.82,71.81,
71.81,71.82,71.82,71.82,71.82,71.82,71.82,71.82,71.82,
71.82,71.82,71.82,71.82,71.82,71.82,71.82,71.82,71.82,
71.82,71.82, 71.82,71.82,71.82,71.82,71.82,71.82,71.82,
71.82,71.82,71.82,71.82,71.82,71.82,71.82,71.82,71.82,
71.82,71.82,71.82,71.82,71.82, 71.82,71.82,71.82,71.82,
71.82,71.82,71.82,71.82,71.82,71.82,71.82,71.82,71.81,
71.82,71.82,71.82,71.82,71.83,71.82,71.82,71.82, 71.81,
71.81,71.81,71.81,71.81,71.81,71.81,71.82,71.82,71.82,
71.81,71.81,71.81,71.82,71.82,71.82,71.82,71.82,71.82,
71.82,71.82,71.82,71.82,71.82,71.83,71.83,71.83,71.83,
71.83,71.83,71.83 ,71.83,71.83,71.83,71.83,71.83,71.83,
71.83,71.83,71.83,71.83,71.83,71.83,71.83,71.83,71.83,
71.83,71.83,71.83,71.83,71.83,71.83 (长度(价格)-1)
TR < - c(NA,TR)$ b如果(价格[i] ==价格[i + 1]){TR [i + 1],则价格为
$ b(长度(价格)-1)){

1] = TR [i]}

if(price [i] < (price [i + 1]){TR [i + 1] = 1}

if(price [i + 1]){TR [i + 1] }

$ b $ / code $ / $ p

这里是我的输出:

  c(NA,-1,-1,-1,-1,1, - 1,-1,-1,1,-1,-1,-1,1,-1,1,
-1,-1,-1,-1,1,1,1,1, 1,1,1,1,1,1,-1,-1,-1,-1,
-1,-1,-1,-1,1,1,1,1,1, 1,-1,-1,-1,-1,-1,-1,-1,
1,1,1,1,1,1,1,1,1,1,1, 1,1,1,1,1,1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1, -1,1,1,1,1,1,1,1,1,
1,1,1,1,1,1,1,1,1,1,1,1,1,1,1 ,1,1,1,1,1,1,
1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1, 1,1,1,1,-1,-1,-1,-1,-1,-1,-1,-1,-1, -1,-1,-1,1,1,1,-1,-1,1,-1,1,-1,1,1,..., ,1,1,
1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1 ,
1,1,1,1,1,1,1,1,1,1)

这里是 my sessionInfo:

 > sessionInfo()
R版本3.1.2(2014-10-31)
平台:x86_64-w64-mingw32 / x64(64位)

语言环境:
[1] = LC_COLLATE English_United States.1252 LC_CTYPE = English_United States.1252
[3] = LC_MONETARY English_United States.1252 LC_NUMERIC = C
[5] = LC_TIME English_United States.1252

附加基础包:
[1] stats graphics grDevices utils datasets methods base

其他附加软件包:
[1] data.table_1.9.4

通过命名空间加载(而不是附加):
[1] chron_2.3-45 plyr_1.8.1 Rcpp_0.11.1 reshape2_1.4 stringr_0.6.2 tools_3.1.2

解决方案


$)
cppFunction(
IntegerVector proc(NumericVector x){
const int n = x.size();
IntegerVector y(n);
y [0] = NA_INTEGER;
for(int i = 1; i if(x [i] == x [i-1])y [i] = y [i-1];
else if(x [i]> x [i-1])y [i] = 1;
else y [i] = -1;
}
return y;






像往常一样,你可以得到一个相当大的加速与Rcpp作为与基本R中的for循环相比:

pre code proc.for< - function(price){
TR< ; - 数字(长度(价格)-1)
TR < - c(NA,TR)
for(i in 1 :(长度(价格)-1)){
if (价格[i] <价格[i + 1]){TR [i](价格[i] ==价格[i + 1]){TR [i + 1] = TR [i]}
+1)= 1}
if(price [i]> price [i + 1]){TR [i + 1] = -1}
}
return(TR)
}
proc.aaron< - 函数(价格){
变化< - sign(diff(price))
good< - change!= 0
goodval < - 变化[好]
c(NA,goodval [cumsum(good)])
}
proc.jbaums< - function(price){
TR< (TR(0))] [find(TR(0))](TR == 0),其中(TR!= 0) )]
TR
}

all.equal(proc(price),proc.for(price),proc.aaron(price),proc.jbaums(price))
#[1] TRUE
library(microbenchmark)
microbenchmark(proc(price),proc.for(price),proc.aaron(price),proc.jbaums(price))
#单位:微秒
#expr min lq平均值UQ最大neval
#PROC(价格)1.871 2.5380 3.92111 3.1110 4.5880 15.318 100
#proc.for(价)408.200 448.2830 542.19766 484.1265 546.3255 1821.104 100
#proc.aaron(价格)23.916 25.5770 33.53259 31.5420 35.8575 190.372 100
#proc.jbaums(价格)33.536 38.8995 46.80109 43.4510 49.3555 112.306 100

我们看到比for循环和10倍的加速比100倍,而对于提供的向量,与向量化的替代相比,增加了100倍。

加速更显着,矢量(在这里测试的长度为100万):

$ $ $ $ $ $ $ $ $ $ $ $ price $ big $ - rep(price,times = 5000)
all.equal(proc(price.big),proc.for(price.big),proc.aaron(price.big),proc.jbaums(price.big))
#[1] TRUE
microbenchmark(proc(price.big),proc.for(price.big),proc.aaron(price.big),proc.jbaums(price.big))
#单位:毫秒
#EXPR分钟LQ平均中值最大UQ neval
#PROC(price.big)1.442119 1.818494 5.094274 2.020437 2.771903 56.54321 100
#proc.for(price.big)2639.819536 2699.493613 2949.962241 2781.636460 3062.277930 4472.35369 100
#proc.aaron(price.big)91.499940 99.859418 132.519296 140.521212 147.462259 207.72813 100个
#proc.jbaums(price.big)117.242451 138.528214 170.989065 170.606048 180.337074 487.13615 100

与矢量化R函数相比,现在我们的加速比for循环要快1000倍,速度提高了70倍。即使在这个尺寸下,如果函数只被调用一次,也不清楚Rcpp与矢量化R解决方案有多大的优势,因为它至少需要100 ms来编译Rcpp代码。如果这是在分析中重复调用的一段代码,那么加速非常有吸引力。


I have created a for loop and I would like to speed it up using the Rcpp library. I am not too familiar with C++. Can you please help me make my function faster? Thank you for your help!

I have included my algorithm, the code, along with the input and the output, with the sessionInfo.

Here is my algorithm:

if the current price is above the previous price, mark (+1) in the column named TR

if the current price is below the previous price, mark (-1) in the column named TR

if the current price is the same as the previous price, mark the same thing as in the previous price in the column named TR

Here is my code:

price <- c(71.91, 71.82, 71.81, 71.81, 71.81, 71.82, 71.81, 71.81, 71.81, 
           71.82, 71.81, 71.81, 71.8, 71.81, 71.8, 71.81, 71.8, 71.8, 71.8, 
           71.8, 71.81, 71.81, 71.81, 71.81, 71.81, 71.81, 71.81, 71.81, 
           71.81, 71.82, 71.81, 71.81, 71.81, 71.81, 71.81, 71.81, 71.8, 
           71.8, 71.81, 71.81, 71.81, 71.81, 71.82, 71.82, 71.81, 71.81, 
           71.81, 71.81, 71.81, 71.81, 71.81, 71.82, 71.82, 71.82, 71.82, 
           71.82, 71.82, 71.82, 71.81, 71.82, 71.82, 71.82, 71.82, 71.82, 
           71.82, 71.82, 71.82, 71.82, 71.81, 71.81, 71.81, 71.82, 71.82, 
           71.81, 71.82, 71.82, 71.82, 71.81, 71.82, 71.82, 71.82, 71.81, 
           71.81, 71.82, 71.82, 71.82, 71.82, 71.82, 71.82, 71.82, 71.82, 
           71.82, 71.82, 71.82, 71.82, 71.82, 71.82, 71.82, 71.82, 71.82, 
           71.82, 71.82, 71.82, 71.82, 71.82, 71.82, 71.82, 71.82, 71.82, 
           71.82, 71.82, 71.82, 71.82, 71.82, 71.82, 71.82, 71.82, 71.82, 
           71.82, 71.82, 71.82, 71.82, 71.82, 71.82, 71.82, 71.82, 71.82, 
           71.82, 71.82, 71.82, 71.82, 71.82, 71.82, 71.82, 71.82, 71.81, 
           71.82, 71.82, 71.82, 71.82, 71.83, 71.82, 71.82, 71.82, 71.81, 
           71.81, 71.81, 71.81, 71.81, 71.81, 71.81, 71.82, 71.82, 71.82, 
           71.81, 71.81, 71.81, 71.82, 71.82, 71.82, 71.82, 71.82, 71.82, 
           71.82, 71.82, 71.82, 71.82, 71.82, 71.83, 71.83, 71.83, 71.83, 
           71.83, 71.83, 71.83, 71.83, 71.83, 71.83, 71.83, 71.83, 71.83, 
           71.83, 71.83, 71.83, 71.83, 71.83, 71.83, 71.83, 71.83, 71.83, 
           71.83, 71.83, 71.83, 71.83, 71.83, 71.83, 71.83, 71.83, 71.83, 
           71.83)

TR <- numeric(length(price)-1)
TR <- c(NA,TR)

for (i in 1: (length(price)-1)){

  if (price[i] == price[i+1]) {TR[i+1] = TR[i]}

  if (price[i] < price[i+1]) {TR[i+1] = 1}

  if (price[i] > price[i+1]) {TR[i+1] = -1}

}

And here is my output: dput(TR) yields

c(NA, -1, -1, -1, -1, 1, -1, -1, -1, 1, -1, -1, -1, 1, -1, 1, 
  -1, -1, -1, -1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, -1, -1, -1, -1, 
  -1, -1, -1, -1, 1, 1, 1, 1, 1, 1, -1, -1, -1, -1, -1, -1, -1, 
  1, 1, 1, 1, 1, 1, 1, -1, 1, 1, 1, 1, 1, 1, 1, 1, 1, -1, -1, -1, 
  1, 1, -1, 1, 1, 1, -1, 1, 1, 1, -1, -1, 1, 1, 1, 1, 1, 1, 1, 
  1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
  1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
  1, 1, 1, -1, 1, 1, 1, 1, 1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 
  -1, 1, 1, 1, -1, -1, -1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
  1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
  1, 1, 1, 1, 1, 1, 1, 1, 1, 1)

and here is my sessionInfo:

> sessionInfo()
R version 3.1.2 (2014-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.9.4

loaded via a namespace (and not attached):
[1] chron_2.3-45  plyr_1.8.1    Rcpp_0.11.1   reshape2_1.4  stringr_0.6.2 tools_3.1.2  

解决方案

You can pretty directly translate the for loop:

library(Rcpp)
cppFunction(
"IntegerVector proc(NumericVector x) {
  const int n = x.size();
  IntegerVector y(n);
  y[0] = NA_INTEGER;
  for (int i=1; i < n; ++i) {
    if (x[i] == x[i-1]) y[i] = y[i-1];
    else if (x[i] > x[i-1]) y[i] = 1;
    else y[i] = -1;
  }
  return y;
}")

As is usual, you can get a pretty big speedup with Rcpp as compared to the for loop in base R:

proc.for <- function(price) {
  TR <- numeric(length(price)-1)
  TR <- c(NA,TR)
  for (i in 1: (length(price)-1)){
    if (price[i] == price[i+1]) {TR[i+1] = TR[i]}
    if (price[i] < price[i+1]) {TR[i+1] = 1}
    if (price[i] > price[i+1]) {TR[i+1] = -1}
  }
  return(TR)
}
proc.aaron <- function(price) {
  change <- sign(diff(price))
  good <- change != 0
  goodval <- change[good]
  c(NA, goodval[cumsum(good)])
}
proc.jbaums <- function(price) {
  TR <- sign(diff(price))
  TR[TR==0] <- TR[which(TR != 0)][findInterval(which(TR == 0), which(TR != 0))]
  TR
}

all.equal(proc(price), proc.for(price), proc.aaron(price), proc.jbaums(price))
# [1] TRUE
library(microbenchmark)
microbenchmark(proc(price), proc.for(price), proc.aaron(price), proc.jbaums(price))
# Unit: microseconds
#                expr     min       lq      mean   median       uq      max neval
#         proc(price)   1.871   2.5380   3.92111   3.1110   4.5880   15.318   100
#     proc.for(price) 408.200 448.2830 542.19766 484.1265 546.3255 1821.104   100
#   proc.aaron(price)  23.916  25.5770  33.53259  31.5420  35.8575  190.372   100
#  proc.jbaums(price)  33.536  38.8995  46.80109  43.4510  49.3555  112.306   100

We see a speedup of more than 100x compared to a for loop and 10x compared to vectorized alternatives for the provided vector.

The speedup is even more significant with a larger vector (length 1 million tested here):

price.big <- rep(price, times=5000)
all.equal(proc(price.big), proc.for(price.big), proc.aaron(price.big), proc.jbaums(price.big))
# [1] TRUE
microbenchmark(proc(price.big), proc.for(price.big), proc.aaron(price.big), proc.jbaums(price.big))
# Unit: milliseconds
#                    expr         min          lq        mean      median          uq        max neval
#         proc(price.big)    1.442119    1.818494    5.094274    2.020437    2.771903   56.54321   100
#     proc.for(price.big) 2639.819536 2699.493613 2949.962241 2781.636460 3062.277930 4472.35369   100
#   proc.aaron(price.big)   91.499940   99.859418  132.519296  140.521212  147.462259  207.72813   100
#  proc.jbaums(price.big)  117.242451  138.528214  170.989065  170.606048  180.337074  487.13615   100

Now we have a 1000x speedup compared to the for loop and a ~70x speedup compared to the vectorized R functions. Even at this size, it's not clear there's much advantage of Rcpp over vectorized R solutions if the function is only called once, since it certainly takes at least 100 ms to compile the Rcpp code. The speedup is pretty attractive if this is a piece of code that's called repeatedly in your analysis.

这篇关于如何使用Rcpp来加速for循环?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆