RCPP:我的距离矩阵程序比包中的函数慢 [英] Rcpp: my distance matrix program is slower than the function in package
问题描述
我想计算成对的欧式距离矩阵。我根据 Dirk Eddelbuettel 的建议编写了Rcpp程序,
I would like to calculate the pairwise euclidean distance matrix. I wrote Rcpp programs by the suggestion of Dirk Eddelbuettel as follows
NumericMatrix calcPWD1 (NumericMatrix x){
int outrows = x.nrow();
double d;
NumericMatrix out(outrows,outrows);
for (int i = 0 ; i < outrows - 1; i++){
for (int j = i + 1 ; j < outrows ; j ++){
NumericVector v1= x.row(i);
NumericVector v2= x.row(j);
NumericVector v3=v1-v2;
d = sqrt(sum(pow(v3,2)));
out(j,i)=d;
out(i,j)=d;
}
}
return (out) ;
}
但是我发现我的程序比 dist慢/ code>函数。
But I find my program is slower than dist
function.
> benchmark(as.matrix(dist(b)),calcPWD1(b))
test replications elapsed relative user.self sys.self user.child sys.child
1 as.matrix(dist(b)) 100 24.831 1.000 24.679 0.010 0 0
2 calcPWD1(b) 100 27.362 1.102 27.346 0.007 0 0
你们有什么建议吗?我的矩阵很简单。没有列名或行名,只是普通矩阵(例如 b = matrix(c(rnorm(1000 * 10)),1000,10)
)。
这是 dist
Do you guys have any suggestion? My matrix is very simple. There is no column names or row names, just plain matrix (for example like b=matrix(c(rnorm(1000*10)),1000,10)
).
Here is the program of dist
> dist
function (x, method = "euclidean", diag = FALSE, upper = FALSE,
p = 2)
{
if (!is.na(pmatch(method, "euclidian")))
method <- "euclidean"
METHODS <- c("euclidean", "maximum", "manhattan", "canberra",
"binary", "minkowski")
method <- pmatch(method, METHODS)
if (is.na(method))
stop("invalid distance method")
if (method == -1)
stop("ambiguous distance method")
x <- as.matrix(x)
N <- nrow(x)
attrs <- if (method == 6L)
list(Size = N, Labels = dimnames(x)[[1L]], Diag = diag,
Upper = upper, method = METHODS[method], p = p, call = match.call(),
class = "dist")
else list(Size = N, Labels = dimnames(x)[[1L]], Diag = diag,
Upper = upper, method = METHODS[method], call = match.call(),
class = "dist")
.Call(C_Cdist, x, method, attrs, p)
}
<bytecode: 0x56b0d40>
<environment: namespace:stats>
我希望我的程序比 dist
快因为在 dist
中,需要检查的东西太多(例如方法
, diag
)。
I expect my program is faster than dist
since in dist
, there are too many thing to need to be checked (like method
, diag
).
推荐答案
Rcpp与内部R函数(C / Fortran)
首先,仅因为您使用Rcpp编写算法并不一定意味着它将击败R等效项,特别是如果R函数调用C或Fortran例程来执行大量的计算。在其他情况下,仅将函数编写为R时,很有可能在Rcpp中对其进行转换会产生所需的速度增益。
Rcpp vs. Internal R Functions (C/Fortran)
First of all, just because you are writing the algorithm using Rcpp does not necessarily mean it will beat out the R equivalent, especially if the R function calls a C or Fortran routine to perform the bulk of the computations. In other cases where the function is written purely in R, there is a high probability that transforming it in Rcpp will yield the desired speed gain.
请记住,在重写内部函数时,将与绝对疯狂的C程序员的R Core团队发生冲突。
Remember, when rewriting internal functions, one is going up against the R Core team of absolutely insane C programmers most likely will win out.
的基本实现,R使用的距离计算是在C中完成的,如以下所示:
Secondly, the distance calculation R uses is done in C as indicated by:
.Call(C_Cdist, x, method, attrs, p)
,这是 dist()$的最后一行c $ c>函数的R源。
此外, C实现会使用OpenMP 来并行化计算。
Furthermore, the C implementation uses OpenMP when available to parallelize the computation.
第三,通过稍微改变子集顺序并避免创建额外的变量,版本减少。
Thirdly, by changing the subset order slightly and avoiding creating an additional variable, the timings between versions decrease.
#include <Rcpp.h>
// [[Rcpp::export]]
Rcpp::NumericMatrix calcPWD1 (const Rcpp::NumericMatrix & x){
unsigned int outrows = x.nrow(), i = 0, j = 0;
double d;
Rcpp::NumericMatrix out(outrows,outrows);
for (i = 0; i < outrows - 1; i++){
Rcpp::NumericVector v1 = x.row(i);
for (j = i + 1; j < outrows ; j ++){
d = sqrt(sum(pow(v1-x.row(j), 2.0)));
out(j,i)=d;
out(i,j)=d;
}
}
return out;
}
这篇关于RCPP:我的距离矩阵程序比包中的函数慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!