如何加快或向量化for循环? [英] How to speed up or vectorize a for loop?

查看:181
本文介绍了如何加快或向量化for循环?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想通过矢量化或使用Data.table或其他东西来提高我的for循环的速度。我必须在1,000,000行上运行代码,而且我的代码非常慢。



代码是不言自明的。为了以防万一,我在下面加了一个解释。我已经包含了函数的输入和输出。希望您能帮助我更快地完成此功能。


$ b 我的 目标每个箱子等于100股。矢量量包含交易的股票数量。这里是它的样子:

$ p $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ 3 1 1 1 1 1 1 1 18 1 1 18 2 7 13 2 7 13 3 2 1 1 3 2 1 1 1
[32] 1 6 6 1 1 1 1 1 1 1 1 18 2 1 1 2 1 14 18 2 1 1 2 1 14 1 1 9 5

向量binIdexVector是一样的卷的长度,它包含了箱号;即第一个100股的每个元素得到数字1,下一个100股的每个元素得到数字2,下一个100股的每个元素得到数字3,以此类推。下面是这个矢量的样子:
$ b $ $ $ $ p $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
[48 ] 2 2 3 3 3 3 3 3 3 3 3 3 3

这是我的功能

$ p $ #输入为矢量
音量<-c(5L,3L,1L,5L 3L,1L,1L,1L,1L,1L,1L,1L,18L,1L,1L,
18L,2L,7L,13L,2L,7L,13L,3L,2L,1L,1L, 2L,1L,1L,
1L,1L,6L,6L,1L,1L,1L,1L,1L,1L,1L,1L,18L,2L,1L,
1L,2L, ,14L,18L,2L,1L,1L,2L,1L,14L,1L,1L,9L,5L,
2L,1L,1L,1L,1L,9L,5L,2L,1L,1L, 2L,1L,1L,3L,1L,
1L,2L,1L,2L,1L,1L,3L,1L,1L,2L,9L,9L,3L,3L,1L,1L, 1L,1L,5L,5L,8L,8L,2L,1L,2L,1L,10L,10L,10L,10L,10L,$ b $ 10L,10L,10L, 9L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L, 1L,2L,2L,2L,2L,
1L,1L,3L,3L,3L,3L,1L,1L,1L,1L,1L,1L,1L,1L,5L,5L,
1L,1L,1L,1L,2L,7L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L, 1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L, 1L,1L,1L,2L,2L,2L,1L,2L,1L,1L,1L,2L,1L,1L,1L,1L,1L,1L,1L,1L, 1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L, 1L,1L,1L,1L,1L,1L,2L,2L,1L,2L,2L,2L,
1L,1L,1L,1L, 1L,1L,1L,7L,7L,3L,1L,1L,1L,4L,3L,1L,
1L,1L,4L,25L,1L,1L, 1L,2 1L,1L,
1L,1L,1L,1L,1L,1L,2L,1L)

binIdexVector< - numeric(length(Volume))

$初始化
binIdex< -1
totalVolume< -0

for(seq_len(length(Volume))){

totalVolume < - totalVolume + Volume [i]

if(totalVolume <= 100){

binIdexVector [i]< - binIdex

} else {

binIdex< - binIdex + 1
binIdexVector [i]< - binIdex
totalVolume< - Volume [i]
}
}

#输出:
>输入(binIdexVector)
c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,
1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,
2 ,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3, 3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,4,4,4,4 4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,5,6,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,25,27,28,27,28,29,23,23,23,24,25,25,27,28,29,23,23,23,24,25,25,27,28,29,23,23,23,23,23,24,25,25,25,23,23,24,25 5,5,5,5,5,5,5,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,25,25,23,23,23,24,25,23,23,23,23,23,23,24,25,23,23,23,23,23,23,23,23,23,23,23,23,23,23,24 ,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6 6,6,6,6,6,6,6,6,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,27,27,27,27,27,27,27,27,29 7,7,7,7,7,7,7,7,7,7,7,7,7,7,7 ,7,7,7,7,7,7 $ b $ 7,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8八,八,八,八,八,八,八,八,八,八,八,八,八,八,八,八,八,八,八, ,8,8,8,8,8,8,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9 ,
9,9,9,9,9,9,9,9,9,9,9,10,10,10,10,10, 10,10,10,10,10,10,10,10,10,10
$ / code> $ / pre>

非常感谢您的帮助!

 > sessionInfo()
R版本3.1.2(2014-10-31)
平台:x86_64-w64-mingw32 / x64(64位)

语言环境:
[1] LC_COLLATE =英语_美国.1252 LC_CTYPE =英语_美国.1252
[3] LC_MONETARY =英语_美国.1252 LC_NUMERIC = C
[5] LC_TIME = English_United States.1252

附加的基本包:
[1] stats graphics grDevices utils数据集方法基

通过命名空间加载(不附加):
[1] tools_3.1.2


解决方案

当向量化困难时,可以使用Rcpp。 >

  library(Rcpp)
cppFunction('
IntegerVector bin(NumericVector Volume,int n){
(int i = 0; i< Volume.size());
int binIdex = 1;
double totalVolume = 0;

; i ++){
totalVolume = totalVolume + Volume [i];
if(totalVolu me< = n){
binIdexVector [i] = binIdex;
} else {
binIdex ++;
binIdexVector [i] = binIdex;
totalVolume = Volume [i];
}
}
return binIdexVector;


all.equal(bin(Volume,100),binIdexVector)
#[1] TRUE
pre>

它比 findInterval(cumsum(Volume),seq(0,sum(Volume),by = 100)(这当然给出了一个不精确的答案)

I would like to increase the speed of my for loop via vectorization or using Data.table or something else. I have to run the code on 1,000,000 rows and my code is really slow.

The code is fairly self-explanatory. I have included an explanation below just in case. I have included the input and the output of the function. Hopefully you will help me make the function faster.

My goal is to bin the vector "Volume", where each bin is equal to 100 shares. The vector "Volume" contains the number of shares traded. Here is what it looks like:

head(Volume, n = 60)
[1]  5  3  1  5  3  1  1  1  1  1  1  1 18  1  1 18  2  7 13  2  7 13  3  2  1  1  3  2  1  1  1
[32]  1  6  6  1  1  1  1  1  1  1  1 18  2  1  1  2  1 14 18  2  1  1  2  1 14  1  1  9  5

The vector "binIdexVector" is the same length of "Volume", and it contains the bin number; that is each element of the first 100 shares get the number 1, each elements of the next 100 shares get the number 2, each elements of the next 100 shares get the number 3, and so on. Here is what that vector looks like:

 head(binIdexVector, n = 60)
[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
[48] 2 2 3 3 3 3 3 3 3 3 3 3 3

Here is my function:

#input as a vector
Volume<-c(5L, 3L, 1L, 5L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 18L, 1L, 1L, 
                   18L, 2L, 7L, 13L, 2L, 7L, 13L, 3L, 2L, 1L, 1L, 3L, 2L, 1L, 1L, 
                   1L, 1L, 6L, 6L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 18L, 2L, 1L, 
                   1L, 2L, 1L, 14L, 18L, 2L, 1L, 1L, 2L, 1L, 14L, 1L, 1L, 9L, 5L, 
                   2L, 1L, 1L, 1L, 1L, 9L, 5L, 2L, 1L, 1L, 1L, 2L, 1L, 1L, 3L, 1L, 
                   1L, 2L, 1L, 2L, 1L, 1L, 3L, 1L, 1L, 2L, 9L, 9L, 3L, 3L, 1L, 1L, 
                   1L, 1L, 5L, 5L, 8L, 8L, 2L, 1L, 2L, 1L, 10L, 10L, 10L, 10L, 10L, 
                   10L, 10L, 10L, 9L, 9L, 1L, 1L, 8L, 1L, 8L, 1L, 8L, 8L, 2L, 1L, 
                   1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 
                   1L, 1L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 5L, 5L, 
                   1L, 2L, 7L, 1L, 2L, 7L, 1L, 1L, 1L, 1L, 2L, 1L, 10L, 1L, 1L, 
                   1L, 1L, 1L, 1L, 2L, 1L, 10L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
                   1L, 1L, 30L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 
                   1L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 2L, 1L, 
                   10L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 10L, 1L, 1L, 1L, 1L, 1L, 
                   1L, 1L, 1L, 1L, 1L, 30L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
                   1L, 1L, 3L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 2L, 2L, 
                   1L, 1L, 1L, 1L, 1L, 1L, 1L, 7L, 7L, 3L, 1L, 1L, 1L, 4L, 3L, 1L, 
                   1L, 1L, 4L, 25L, 1L, 1L, 25L, 1L, 1L, 2L, 1L, 1L, 2L, 1L, 1L, 
                   1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L)

binIdexVector <- numeric(length(Volume))

# initialize 
binIdex <-1
totalVolume <-0

for(i in seq_len(length(Volume))){

  totalVolume <- totalVolume + Volume[i]  

  if (totalVolume <= 100) {

    binIdexVector[i] <- binIdex

  } else {

    binIdex <- binIdex + 1
    binIdexVector[i] <- binIdex
    totalVolume <- Volume[i]
  }
}

# output:
> dput(binIdexVector)
c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
  1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 
  2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 
  3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 
  3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 
  4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 
  6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 
  6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 7, 
  7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 
  7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 
  7, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 
  8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 
  8, 8, 8, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 
  9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 10, 10, 10, 10, 10, 10, 10, 
  10, 10, 10, 10, 10, 10, 10, 10, 10, 10)

Thank a lot for your help!

> sessionInfo()
R version 3.1.2 (2014-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] tools_3.1.2

解决方案

You can use Rcpp when vectorization is difficult.

library(Rcpp)
cppFunction('
  IntegerVector bin(NumericVector Volume, int n) {
    IntegerVector binIdexVector(Volume.size());
    int binIdex = 1;
    double totalVolume =0;

    for(int i=0; i<Volume.size(); i++){
      totalVolume = totalVolume + Volume[i];
      if (totalVolume <= n) {
        binIdexVector[i] = binIdex;
      } else {
        binIdex++;
        binIdexVector[i] = binIdex;
        totalVolume = Volume[i];
      }
    }
    return binIdexVector;
  }')

all.equal(bin(Volume, 100), binIdexVector)
#[1] TRUE

It's faster than findInterval(cumsum(Volume), seq(0, sum(Volume), by=100)) (which of course gives an inexact answer)

这篇关于如何加快或向量化for循环?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆