R - 向量化条件替换 [英] R - vectorised conditional replace
问题描述
你好我想操作一个数字列表,我想这样做没有for循环,在R中使用快速本地操作。操作的伪代码是:
默认情况下,起始总数为100(对于零中的每个块)
从第一个零到下一个零,累计总额下降超过2%,将所有后续数字替换为零。
到目前为止,所有数字块都在零之内
累计总和重置为100次
例如,如果以下是我的数据:
d <-c(0,0,0,1,3,4,5,-1,2,3,-5,8, 0,0,-2,-3,3,5,0,0,0,-1,-1,-1,-1);
结果将会是:
0 0 0 1 3 4 5 -1 2 3 -5 0 0 0 -2 -3 0 0 0 0 0 -1 -1 -1 0
目前我有一个for循环的实现,但是因为我的向量很长,所以性能很差。
$
$ b以下是一个正在运行的示例代码:
d <-c(0,0,0,1,3,4,5,-1,2,3,-5,8,0,0,-2,-3,3, 5,0,0,0,-1,-1,-1,-1);
ans< - d;
running_total< - 100;
count < - 1;
max < - 100;
切换< - FALSE;
处理< - FALSE; ($ i
$ b $
if(toggle == TRUE){
ans [count] = 0;
}
else {
running_total = running_total + i;
$ b $ if(running_total> max){max = running_total;}
else if(0.98 * max> running_total){
toggle< - TRUE;
$ b if(i == 0&& processing == TRUE)
{
running_total = 100 ;
max = 100;
切换< - FALSE;
}
count < - count + 1;
cat(ans)
解决方案<我不知道如何将你的循环转化为矢量化的操作。不过,有两个相当简单的选择来提高性能。首先是简单地把你的循环放到
R
函数中,并使用编译器
包来预编译它。第二个稍微复杂的选项是将你的R
循环转换成c ++
循环,并使用Rcpp
包将其链接到R
函数。然后你调用一个R
函数,把它传递给c ++
这个代码很快。我显示这两个选项和时间。我很想感谢Rcpp listserv的Alexandre Bujard的帮助,他帮我指点了一个我不明白的问题。
首先,这是你的
R
循环作为函数,foo.r
。
<$ p $您的R循环作为函数
foo.r< - 函数(d){
ans< - d
running_total< - 100
count < - 1
max < - 100
toggle< - FALSE
处理< - FALSE
for(i in d){
if(toggle == TRUE){
ans [count] < - 0
}如果(i!= 0){
处理< - TRUE
} else {
running_total = running_total + i;
if(running_total> max){
max< - running_total
} else if(0.98 * max> running_total){
toggle< - TRUE
}
$ b if(i == 0&& processing == TRUE){
running_total < - 100
max < - 100
toggle< - FALSE
}
count< - count + 1
}
return(ans)
}
$ c现在我们可以加载编译器
包并编译这个函数,并把它称为<$ c $ p
$> c $ c> foo.rcomp 。##加载编译器包并编译你的R循环
require(compiler)
foo.rcomp < - cmpfun(foo.r)
<这就是编译路线所需的一切。这全是R
,显然非常简单。现在,我们使用Rcpp
包以及inline code> package,它允许我们内嵌
c ++
代码。也就是说,我们不必编译一个源文件并编译它,我们只是将它包含在R
代码中,编译就是为我们处理的。## load Rcpp包和内联以方便链接
/ pre>
require(Rcpp)
require(内联)
## Rcpp版本
src< - '
const NumericVector xx(x);
int n = xx.size();
NumericVector res = clone(xx);
int toggle = 0;
int处理= 0;
int tot = 100;
int max = 100;
typedef NumericVector :: iterator vec_iterator;
vec_iterator ixx = xx.begin();
vec_iterator ires = res.begin();
for(int i = 0; iif(ixx [i]!= 0){
processing = 1;
if(toggle == 1){
ires [i] = 0;
} else {
tot + = ixx [i];
if(tot> max){
max = tot;
} else if(.98 * max> tot){
toggle = 1;
if(ixx [i] == 0&& processing == 1){
tot = 100;
max = 100;
toggle = 0;
}
}
return res;
foo.rcpp< - cxxfunction(signature(x =numeric),src,plugin =Rcpp)
现在我们可以测试我们得到的预期结果:
pre $##显示等同于
d <-c(0,0,0,1,3,4,5,-1,2,3,-5,8,0,0,-2,-3 ,3,5,0,0,0,-1,-1,-1,-1)
all.equal(foo.r(d),foo.rcpp(d))
最后,通过重复10e4创建更大版本的
d
倍。然后我们可以运行这三个不同的函数:纯代码R
代码,编译代码R
代码和与
函数c ++
代码链接的R##做更大的向量来测试性能
dbig < - rep(d,10 ^ 5)
system.time(res.r <-foo.r(dbig))
system.time(res.rcomp< -foo.rcomp(dbig))
system.time(res.rcpp< -foo.rcpp(dbig))
< / $ c
$ p
$ b $ p
$ $ $ $ $ $ C>> system.time(res.r <-foo.r(dbig))
用户系统经过
12.55 0.02 12.61
> system.time(res.rcomp< -foo.rcomp(dbig));
用户系统经过的
2.17 0.01 2.19
> system.time(res.rcpp <-foo.rcpp(dbig))
用户系统已用完
0.01 0.00 0.02
编译的R
代码大约需要编译的时间的1/6R
代码只需要2秒就可以运行在250万的矢量上。即使编译完成的R
代码只需0.02秒,c ++
代码也要快几个数量级。除了初始设置,基本循环的语法在R
和c ++
中几乎是相同的,所以你甚至不用失去清晰度。我怀疑,即使你的循环的部分或全部都可以在R
中进行向量化,那么你将会为了击败R
链接到
c ++
的函数。最后,只是为了证明:
> all.equal(res.r,res.rcomp)
[1] TRUE
> all.equal(res.r,res.rcpp)
[1] TRUE
不同的函数返回相同的结果。
Hi I'm trying manipulate a list of numbers and I would like to do so without a for loop, using fast native operation in R. The pseudocode for the manipulation is :
By default the starting total is 100 (for every block within zeros)
From the first zero to next zero, the moment the cumulative total falls by more than 2% replace all subsequent numbers with zero.
Do this far all blocks of numbers within zeros
The cumulative sums resets to 100 every time
For example if following were my data :
d <- c(0,0,0,1,3,4,5,-1,2,3,-5,8,0,0,-2,-3,3,5,0,0,0,-1,-1,-1,-1);
Results would be :
0 0 0 1 3 4 5 -1 2 3 -5 0 0 0 -2 -3 0 0 0 0 0 -1 -1 -1 0
Currently I have an implementation with a for loop, but since my vector is really long, the performance is terrible.
Thanks in advance.
Here is a running sample code :
d <- c(0,0,0,1,3,4,5,-1,2,3,-5,8,0,0,-2,-3,3,5,0,0,0,-1,-1,-1,-1); ans <- d; running_total <- 100; count <- 1; max <- 100; toggle <- FALSE; processing <- FALSE; for(i in d){ if( i != 0 ){ processing <- TRUE; if(toggle == TRUE){ ans[count] = 0; } else{ running_total = running_total + i; if( running_total > max ){ max = running_total;} else if ( 0.98*max > running_total){ toggle <- TRUE; } } } if( i == 0 && processing == TRUE ) { running_total = 100; max = 100; toggle <- FALSE; } count <- count + 1; } cat(ans)
解决方案I am not sure how to translate your loop into vectorized operations. However, there are two fairly easy options for large performance improvements. The first is to simply put your loop into an
R
function, and use thecompiler
package to precompile it. The second slightly more complicated option is to translate yourR
loop into ac++
loop and use theRcpp
package to link it to anR
function. Then you call anR
function that passes it toc++
code which is fast. I show both these options and timings. I do want to gratefully acknowledge the help of Alexandre Bujard from the Rcpp listserv, who helped me with a pointer issue I did not understand.First, here is your
R
loop as a function,foo.r
.## Your R loop as a function foo.r <- function(d) { ans <- d running_total <- 100 count <- 1 max <- 100 toggle <- FALSE processing <- FALSE for(i in d){ if(i != 0 ){ processing <- TRUE if(toggle == TRUE){ ans[count] <- 0 } else { running_total = running_total + i; if (running_total > max) { max <- running_total } else if (0.98*max > running_total) { toggle <- TRUE } } } if(i == 0 && processing == TRUE) { running_total <- 100 max <- 100 toggle <- FALSE } count <- count + 1 } return(ans) }
Now we can load the
compiler
package and compile the function and call itfoo.rcomp
.## load compiler package and compile your R loop require(compiler) foo.rcomp <- cmpfun(foo.r)
That is all it takes for the compilation route. It is all
R
and obviously very easy. Now for thec++
approach, we use theRcpp
package as well as theinline
package which allows us to "inline" thec++
code. That is, we do not have to make a source file and compile it, we just include it in theR
code and the compilation is handled for us.## load Rcpp package and inline for ease of linking require(Rcpp) require(inline) ## Rcpp version src <- ' const NumericVector xx(x); int n = xx.size(); NumericVector res = clone(xx); int toggle = 0; int processing = 0; int tot = 100; int max = 100; typedef NumericVector::iterator vec_iterator; vec_iterator ixx = xx.begin(); vec_iterator ires = res.begin(); for (int i = 0; i < n; i++) { if (ixx[i] != 0) { processing = 1; if (toggle == 1) { ires[i] = 0; } else { tot += ixx[i]; if (tot > max) { max = tot; } else if (.98 * max > tot) { toggle = 1; } } } if (ixx[i] == 0 && processing == 1) { tot = 100; max = 100; toggle = 0; } } return res; ' foo.rcpp <- cxxfunction(signature(x = "numeric"), src, plugin = "Rcpp")
Now we can test that we get the expected results:
## demonstrate equivalence d <- c(0,0,0,1,3,4,5,-1,2,3,-5,8,0,0,-2,-3,3,5,0,0,0,-1,-1,-1,-1) all.equal(foo.r(d), foo.rcpp(d))
Finally, create a much larger version of
d
by repeating it 10e4 times. Then we can run the three different functions, pureR
code, compiledR
code, andR
function linked toc++
code.## make larger vector to test performance dbig <- rep(d, 10^5) system.time(res.r <- foo.r(dbig)) system.time(res.rcomp <- foo.rcomp(dbig)) system.time(res.rcpp <- foo.rcpp(dbig))
Which on my system, gives:
> system.time(res.r <- foo.r(dbig)) user system elapsed 12.55 0.02 12.61 > system.time(res.rcomp <- foo.rcomp(dbig)) user system elapsed 2.17 0.01 2.19 > system.time(res.rcpp <- foo.rcpp(dbig)) user system elapsed 0.01 0.00 0.02
The compiled
R
code takes about 1/6 the time the uncompiledR
code taking only 2 seconds to operate on the vector of 2.5 million. Thec++
code is orders of magnitude faster even then the compiledR
code requiring just .02 seconds to complete. Aside from the initial setup, the syntax for the basic loop is nearly identical inR
andc++
so you do not even lose clarity. I suspect that even if parts or all of your loop could be vectorized inR
, you would be sore pressed to beat the performance of theR
function linked toc++
. Lastly, just for proof:> all.equal(res.r, res.rcomp) [1] TRUE > all.equal(res.r, res.rcpp) [1] TRUE
The different functions return the same results.
这篇关于R - 向量化条件替换的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!