如何加快或向量化for循环? [英] How to speed up or vectorize a for loop?
问题描述
我想通过矢量化或使用Data.table或其他东西来提高我的for循环的速度。我必须在1,000,000行上运行代码,而且我的代码非常慢。
代码是不言自明的。为了以防万一,我在下面加了一个解释。我已经包含了函数的输入和输出。希望您能帮助我更快地完成此功能。
$ b 我的 目标每个箱子等于100股。矢量量包含交易的股票数量。这里是它的样子:
$ p $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ 3 1 1 1 1 1 1 1 18 1 1 18 2 7 13 2 7 13 3 2 1 1 3 2 1 1 1
[32] 1 6 6 1 1 1 1 1 1 1 1 18 2 1 1 2 1 14 18 2 1 1 2 1 14 1 1 9 5
向量binIdexVector是一样的卷的长度,它包含了箱号;即第一个100股的每个元素得到数字1,下一个100股的每个元素得到数字2,下一个100股的每个元素得到数字3,以此类推。下面是这个矢量的样子:
$ b $ $ $ $ p $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
[48 ] 2 2 3 3 3 3 3 3 3 3 3 3 3
这是我的功能 非常感谢您的帮助!
$ p $ #输入为矢量
音量<-c(5L,3L,1L,5L 3L,1L,1L,1L,1L,1L,1L,1L,18L,1L,1L,
18L,2L,7L,13L,2L,7L,13L,3L,2L,1L,1L, 2L,1L,1L,
1L,1L,6L,6L,1L,1L,1L,1L,1L,1L,1L,1L,18L,2L,1L,
1L,2L, ,14L,18L,2L,1L,1L,2L,1L,14L,1L,1L,9L,5L,
2L,1L,1L,1L,1L,9L,5L,2L,1L,1L, 2L,1L,1L,3L,1L,
1L,2L,1L,2L,1L,1L,3L,1L,1L,2L,9L,9L,3L,3L,1L,1L, 1L,1L,5L,5L,8L,8L,2L,1L,2L,1L,10L,10L,10L,10L,10L,$ b $ 10L,10L,10L, 9L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L, 1L,2L,2L,2L,2L,
1L,1L,3L,3L,3L,3L,1L,1L,1L,1L,1L,1L,1L,1L,5L,5L,
1L,1L,1L,1L,2L,7L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L, 1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L, 1L,1L,1L,2L,2L,2L,1L,2L,1L,1L,1L,2L,1L,1L,1L,1L,1L,1L,1L,1L, 1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L, 1L,1L,1L,1L,1L,1L,2L,2L,1L,2L,2L,2L,
1L,1L,1L,1L, 1L,1L,1L,7L,7L,3L,1L,1L,1L,4L,3L,1L,
1L,1L,4L,25L,1L,1L, 1L,2 1L,1L,
1L,1L,1L,1L,1L,1L,2L,1L)
binIdexVector< - numeric(length(Volume))
$初始化
binIdex< -1
totalVolume< -0
for(seq_len(length(Volume))){
totalVolume < - totalVolume + Volume [i]
if(totalVolume <= 100){
binIdexVector [i]< - binIdex
} else {
binIdex< - binIdex + 1
binIdexVector [i]< - binIdex
totalVolume< - Volume [i]
}
}
#输出:
>输入(binIdexVector)
c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,
1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,
2 ,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3, 3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,4,4,4,4 4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,5,6,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,25,27,28,27,28,29,23,23,23,24,25,25,27,28,29,23,23,23,24,25,25,27,28,29,23,23,23,23,23,24,25,25,25,23,23,24,25 5,5,5,5,5,5,5,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,25,25,23,23,23,24,25,23,23,23,23,23,23,24,25,23,23,23,23,23,23,23,23,23,23,23,23,23,23,24 ,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6 6,6,6,6,6,6,6,6,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,27,27,27,27,27,27,27,27,29 7,7,7,7,7,7,7,7,7,7,7,7,7,7,7 ,7,7,7,7,7,7 $ b $ 7,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8八,八,八,八,八,八,八,八,八,八,八,八,八,八,八,八,八,八,八, ,8,8,8,8,8,8,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9 ,
9,9,9,9,9,9,9,9,9,9,9,10,10,10,10,10, 10,10,10,10,10,10,10,10,10,10
$ / code> $ / pre>
> sessionInfo()
R版本3.1.2(2014-10-31)
平台:x86_64-w64-mingw32 / x64(64位)
语言环境:
[1] LC_COLLATE =英语_美国.1252 LC_CTYPE =英语_美国.1252
[3] LC_MONETARY =英语_美国.1252 LC_NUMERIC = C
[5] LC_TIME = English_United States.1252
附加的基本包:
[1] stats graphics grDevices utils数据集方法基
通过命名空间加载(不附加):
[1] tools_3.1.2
当向量化困难时,可以使用Rcpp。 >
library(Rcpp)
pre>
cppFunction('
IntegerVector bin(NumericVector Volume,int n){
(int i = 0; i< Volume.size());
int binIdex = 1;
double totalVolume = 0;
; i ++){
totalVolume = totalVolume + Volume [i];
if(totalVolu me< = n){
binIdexVector [i] = binIdex;
} else {
binIdex ++;
binIdexVector [i] = binIdex;
totalVolume = Volume [i];
}
}
return binIdexVector;
all.equal(bin(Volume,100),binIdexVector)
#[1] TRUE
它比
findInterval(cumsum(Volume),seq(0,sum(Volume),by = 100)
(这当然给出了一个不精确的答案)I would like to increase the speed of my for loop via vectorization or using Data.table or something else. I have to run the code on 1,000,000 rows and my code is really slow.
The code is fairly self-explanatory. I have included an explanation below just in case. I have included the input and the output of the function. Hopefully you will help me make the function faster.
My goal is to bin the vector "Volume", where each bin is equal to 100 shares. The vector "Volume" contains the number of shares traded. Here is what it looks like:
head(Volume, n = 60) [1] 5 3 1 5 3 1 1 1 1 1 1 1 18 1 1 18 2 7 13 2 7 13 3 2 1 1 3 2 1 1 1 [32] 1 6 6 1 1 1 1 1 1 1 1 18 2 1 1 2 1 14 18 2 1 1 2 1 14 1 1 9 5
The vector "binIdexVector" is the same length of "Volume", and it contains the bin number; that is each element of the first 100 shares get the number 1, each elements of the next 100 shares get the number 2, each elements of the next 100 shares get the number 3, and so on. Here is what that vector looks like:
head(binIdexVector, n = 60) [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 [48] 2 2 3 3 3 3 3 3 3 3 3 3 3
Here is my function:
#input as a vector Volume<-c(5L, 3L, 1L, 5L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 18L, 1L, 1L, 18L, 2L, 7L, 13L, 2L, 7L, 13L, 3L, 2L, 1L, 1L, 3L, 2L, 1L, 1L, 1L, 1L, 6L, 6L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 18L, 2L, 1L, 1L, 2L, 1L, 14L, 18L, 2L, 1L, 1L, 2L, 1L, 14L, 1L, 1L, 9L, 5L, 2L, 1L, 1L, 1L, 1L, 9L, 5L, 2L, 1L, 1L, 1L, 2L, 1L, 1L, 3L, 1L, 1L, 2L, 1L, 2L, 1L, 1L, 3L, 1L, 1L, 2L, 9L, 9L, 3L, 3L, 1L, 1L, 1L, 1L, 5L, 5L, 8L, 8L, 2L, 1L, 2L, 1L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 9L, 9L, 1L, 1L, 8L, 1L, 8L, 1L, 8L, 8L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 5L, 5L, 1L, 2L, 7L, 1L, 2L, 7L, 1L, 1L, 1L, 1L, 2L, 1L, 10L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 10L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 30L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 2L, 1L, 10L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 10L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 30L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 7L, 7L, 3L, 1L, 1L, 1L, 4L, 3L, 1L, 1L, 1L, 4L, 25L, 1L, 1L, 25L, 1L, 1L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L) binIdexVector <- numeric(length(Volume)) # initialize binIdex <-1 totalVolume <-0 for(i in seq_len(length(Volume))){ totalVolume <- totalVolume + Volume[i] if (totalVolume <= 100) { binIdexVector[i] <- binIdex } else { binIdex <- binIdex + 1 binIdexVector[i] <- binIdex totalVolume <- Volume[i] } } # output: > dput(binIdexVector) c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10)
Thank a lot for your help!
> sessionInfo() R version 3.1.2 (2014-10-31) Platform: x86_64-w64-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] tools_3.1.2
解决方案You can use Rcpp when vectorization is difficult.
library(Rcpp) cppFunction(' IntegerVector bin(NumericVector Volume, int n) { IntegerVector binIdexVector(Volume.size()); int binIdex = 1; double totalVolume =0; for(int i=0; i<Volume.size(); i++){ totalVolume = totalVolume + Volume[i]; if (totalVolume <= n) { binIdexVector[i] = binIdex; } else { binIdex++; binIdexVector[i] = binIdex; totalVolume = Volume[i]; } } return binIdexVector; }') all.equal(bin(Volume, 100), binIdexVector) #[1] TRUE
It's faster than
findInterval(cumsum(Volume), seq(0, sum(Volume), by=100))
(which of course gives an inexact answer)这篇关于如何加快或向量化for循环?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!