为什么这种朴素的矩阵乘法比基数R快? [英] Why is this naive matrix multiplication faster than base R's?

查看:120
本文介绍了为什么这种朴素的矩阵乘法比基数R快?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在R中,矩阵乘法非常优化,即实际上只是对BLAS/LAPACK的调用.但是,令我惊讶的是,这种用于矩阵向量乘法的非常幼稚的C ++代码似乎可靠地快了30%.

In R, matrix multiplication is very optimized, i.e. is really just a call to BLAS/LAPACK. However, I'm surprised this very naive C++ code for matrix-vector multiplication seems reliably 30% faster.

 library(Rcpp)

 # Simple C++ code for matrix multiplication
 mm_code = 
 "NumericVector my_mm(NumericMatrix m, NumericVector v){
   int nRow = m.rows();
   int nCol = m.cols();
   NumericVector ans(nRow);
   double v_j;
   for(int j = 0; j < nCol; j++){
     v_j = v[j];
     for(int i = 0; i < nRow; i++){
       ans[i] += m(i,j) * v_j;
     }
   }
   return(ans);
 }
 "
 # Compiling
 my_mm = cppFunction(code = mm_code)

 # Simulating data to use
 nRow = 10^4
 nCol = 10^4

 m = matrix(rnorm(nRow * nCol), nrow = nRow)
 v = rnorm(nCol)

 system.time(my_ans <- my_mm(m, v))
#>    user  system elapsed 
#>   0.103   0.001   0.103 
 system.time(r_ans <- m %*% v)
#>   user  system elapsed 
#>  0.154   0.001   0.154 

 # Double checking answer is correct
 max(abs(my_ans - r_ans))
 #> [1] 0

基数R的%*%是否执行我跳过的某种数据检查?

Does base R's %*% perform some type of data check that I'm skipping over?

在了解了发生了什么之后(谢谢!),值得注意的是,对于R的%*%,这是最坏的情况,即矢量矩阵.例如,@ RalfStubner指出,使用矩阵向量乘法的RcppArmadillo实现比我演示的朴素实现还要快,这意味着比基础R快得多,但实际上与矩阵R的基础R %*%相同乘(当两个矩阵都大且平方时):

After understanding what's going on (thanks SO!), it's worth noting that this is a worst case scenario for R's %*%, i.e. matrix by vector. For example, @RalfStubner pointed out that using an RcppArmadillo implementation of a matrix-vector multiply is even faster than the naive implementation that I demonstrated, implying considerable faster than base R, but is virtually identical to base R's %*% for matrix-matrix multiply (when both matrices are large and square):

 arma_code <- 
   "arma::mat arma_mm(const arma::mat& m, const arma::mat& m2) {
 return m * m2;
 };"
 arma_mm = cppFunction(code = arma_code, depends = "RcppArmadillo")

 nRow = 10^3 
 nCol = 10^3

 mat1 = matrix(rnorm(nRow * nCol), 
               nrow = nRow)
 mat2 = matrix(rnorm(nRow * nCol), 
               nrow = nRow)

 system.time(arma_mm(mat1, mat2))
#>   user  system elapsed 
#>   0.798   0.008   0.814 
 system.time(mat1 %*% mat2)
#>   user  system elapsed 
#>   0.807   0.005   0.822  

因此,R的当前(v3.5.0)%*%对于矩阵矩阵几乎是最佳的,但是如果您可以跳过检查,则可以大大提高矩阵向量的速度.

So R's current (v3.5.0) %*% is near optimal for matrix-matrix, but could be significantly sped up for matrix-vector if you're okay skipping the checking.

推荐答案

names.c(此处是链接 do_matprod的代码.

A quick glance in names.c (here in particular) points you to do_matprod, the C function that is called by %*% and which is found in the file array.c. (Interestingly, it turns out, that both crossprod and tcrossprod dispatch to that same function as well). Here is a link to the code of do_matprod.

滚动浏览该功能,您会发现它可以处理许多您幼稚的实现所不具备的功能,包括:

Scrolling through the function, you can see that it takes care of a number of things your naive implementation does not, including:

  1. 保留行和列的名称(在有意义的地方).
  2. 当通过调用%*%操作的两个对象属于已提供此类方法的类时,允许分派给其他S4方法. (这就是这部分.)
  3. 处理实矩阵和复杂矩阵.
  4. 实施一系列规则,以处理矩阵和矩阵,向量和矩阵,矩阵和向量以及向量和向量的乘法. (回想一下,在R中进行交叉乘法时,LHS上的向量被视为行向量,而RHS上的向量被视为列向量;这就是做到这一点的代码.)
  1. Keeps row and column names, where that makes sense.
  2. Allows for dispatch to alternative S4 methods when the two objects being operated on by a call to %*% are of classes for which such methods have been provided. (That's what's happening in this portion of the function.)
  3. Handles both real and complex matrices.
  4. Implements a series of rules for how to handle multiplication of a matrix and a matrix, a vector and a matrix, a matrix and a vector, and a vector and a vector. (Recall that under cross-multiplication in R, a vector on the LHS is treated as a row vector, whereas on the RHS, it is treated as a column vector; this is the code that makes that so.)

函数接近尾声,它将分派到 matprod cmatprod .有趣的是(至少对我而言),对于实数矩阵,如果任何一个矩阵都可能包含NaNInf值,则matprod会分派(

Near the end of the function, it dispatches to either of matprod or or cmatprod. Interestingly (to me at least), in the case of real matrices, if either matrix might contain NaN or Inf values, then matprod dispatches (here) to a function called simple_matprod which is about as simple and straightforward as your own. Otherwise, it dispatches to one of a couple of BLAS Fortran routines which, presumably are faster, if uniformly 'well-behaved' matrix elements can be guaranteed.

这篇关于为什么这种朴素的矩阵乘法比基数R快?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆