RCPP：我的距离矩阵程序比包中的函数慢 [英] Rcpp: my distance matrix program is slower than the function in package

查看：128 发布时间：2020/9/27 4:53:07 c++ r rcpp

本文介绍了RCPP：我的距离矩阵程序比包中的函数慢的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想计算成对的欧式距离矩阵。我根据 Dirk Eddelbuettel 的建议编写了Rcpp程序，

I would like to calculate the pairwise euclidean distance matrix. I wrote Rcpp programs by the suggestion of Dirk Eddelbuettel as follows

NumericMatrix calcPWD1 (NumericMatrix x){
  int outrows = x.nrow();
  double d;
  NumericMatrix out(outrows,outrows);

  for (int i = 0 ; i < outrows - 1; i++){
    for (int j = i + 1  ; j < outrows ; j ++){
      NumericVector v1= x.row(i);
      NumericVector v2= x.row(j);
      NumericVector v3=v1-v2;
      d = sqrt(sum(pow(v3,2)));
      out(j,i)=d;
      out(i,j)=d;
    }
  }
  return (out) ;
}

但是我发现我的程序比 dist慢/ code>函数。


But I find my program is slower than dist function.
> benchmark(as.matrix(dist(b)),calcPWD1(b))
                test replications elapsed relative user.self sys.self user.child sys.child
1 as.matrix(dist(b))          100  24.831    1.000    24.679    0.010          0         0
2        calcPWD1(b)          100  27.362    1.102    27.346    0.007          0         0

你们有什么建议吗？我的矩阵很简单。没有列名或行名，只是普通矩阵（例如 b = matrix（c（rnorm（1000 * 10）），1000,10））。 
这是 dist  
Do you guys have any suggestion? My matrix is very simple. There is no column names or row names, just plain matrix (for example like b=matrix(c(rnorm(1000*10)),1000,10)).
Here is the program of dist
> dist
function (x, method = "euclidean", diag = FALSE, upper = FALSE, 
    p = 2) 
{
    if (!is.na(pmatch(method, "euclidian"))) 
        method <- "euclidean"
    METHODS <- c("euclidean", "maximum", "manhattan", "canberra", 
        "binary", "minkowski")
    method <- pmatch(method, METHODS)
    if (is.na(method)) 
        stop("invalid distance method")
    if (method == -1) 
        stop("ambiguous distance method")
    x <- as.matrix(x)
    N <- nrow(x)
    attrs <- if (method == 6L) 
        list(Size = N, Labels = dimnames(x)[[1L]], Diag = diag, 
            Upper = upper, method = METHODS[method], p = p, call = match.call(), 
            class = "dist")
    else list(Size = N, Labels = dimnames(x)[[1L]], Diag = diag, 
        Upper = upper, method = METHODS[method], call = match.call(), 
        class = "dist")
    .Call(C_Cdist, x, method, attrs, p)
}
<bytecode: 0x56b0d40>
<environment: namespace:stats>

我希望我的程序比 dist 快因为在 dist 中，需要检查的东西太多（例如方法， diag ）。
I expect my program is faster than dist since in dist, there are too many thing to need to be checked (like method, diag).
推荐答案
 
 Rcpp与内部R函数（C / Fortran）
 
 
 首先，仅因为您使用Rcpp编写算法并不一定意味着它将击败R等效项，特别是如果R函数调用C或Fortran例程来执行大量的计算。在其他情况下，仅将函数编写为R时，很有可能在Rcpp中对其进行转换会产生所需的速度增益。 
Rcpp vs. Internal R Functions (C/Fortran)

First of all, just because you are writing the algorithm using Rcpp does not necessarily mean it will beat out the R equivalent, especially if the R function calls a C or Fortran routine to perform the bulk of the computations. In other cases where the function is written purely in R, there is a high probability that transforming it in Rcpp will yield the desired speed gain. 
请记住，在重写内部函数时，将与绝对疯狂的C程序员的R Core团队发生冲突。 
Remember, when rewriting internal functions, one is going up against the R Core team of absolutely insane C programmers most likely will win out. 
的基本实现，R使用的距离计算是在C中完成的，如以下所示：
Secondly, the distance calculation R uses is done in C as indicated by:
.Call(C_Cdist, x, method, attrs, p)

，这是 dist（）函数的R源。 

此外， C实现会使用OpenMP 来并行化计算。 
Furthermore, the C implementation uses OpenMP when available to parallelize the computation. 
第三，通过稍微改变子集顺序并避免创建额外的变量，版本减少。 
Thirdly, by changing the subset order slightly and avoiding creating an additional variable, the timings between versions decrease. 
#include <Rcpp.h>

// [[Rcpp::export]]
Rcpp::NumericMatrix calcPWD1 (const Rcpp::NumericMatrix & x){
  unsigned int outrows = x.nrow(), i = 0, j = 0;
  double d;
  Rcpp::NumericMatrix out(outrows,outrows);

  for (i = 0; i < outrows - 1; i++){
    Rcpp::NumericVector v1 = x.row(i);
    for (j = i + 1; j < outrows ; j ++){
      d = sqrt(sum(pow(v1-x.row(j), 2.0)));
      out(j,i)=d;
      out(i,j)=d;
    }
  }

  return out;
}


                        这篇关于RCPP：我的距离矩阵程序比包中的函数慢的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

RCPP：我的距离矩阵程序比包中的函数慢 [英] Rcpp: my distance matrix program is slower than the function in package

问题描述

推荐答案

Rcpp与内部R函数（C / Fortran）

Rcpp vs. Internal R Functions (C/Fortran)

相关文章

C/C++开发最新文章

热门教程

热门工具

登录关闭

RCPP：我的距离矩阵程序比包中的函数慢 [英] Rcpp: my distance matrix program is slower than the function in package

问题描述

推荐答案

Rcpp与内部R函数（C / Fortran）

Rcpp vs. Internal R Functions (C/Fortran)

相关文章

C/C++开发最新文章

热门教程

热门工具

登录 关闭

登录关闭