为什么这种朴素的矩阵乘法比基数R快? [英] Why is this naive matrix multiplication faster than base R's?
问题描述
在R中,矩阵乘法非常优化,即实际上只是对BLAS/LAPACK的调用.但是,令我惊讶的是,这种用于矩阵向量乘法的非常幼稚的C ++代码似乎可靠地快了30%.
In R, matrix multiplication is very optimized, i.e. is really just a call to BLAS/LAPACK. However, I'm surprised this very naive C++ code for matrix-vector multiplication seems reliably 30% faster.
library(Rcpp)
# Simple C++ code for matrix multiplication
mm_code =
"NumericVector my_mm(NumericMatrix m, NumericVector v){
int nRow = m.rows();
int nCol = m.cols();
NumericVector ans(nRow);
double v_j;
for(int j = 0; j < nCol; j++){
v_j = v[j];
for(int i = 0; i < nRow; i++){
ans[i] += m(i,j) * v_j;
}
}
return(ans);
}
"
# Compiling
my_mm = cppFunction(code = mm_code)
# Simulating data to use
nRow = 10^4
nCol = 10^4
m = matrix(rnorm(nRow * nCol), nrow = nRow)
v = rnorm(nCol)
system.time(my_ans <- my_mm(m, v))
#> user system elapsed
#> 0.103 0.001 0.103
system.time(r_ans <- m %*% v)
#> user system elapsed
#> 0.154 0.001 0.154
# Double checking answer is correct
max(abs(my_ans - r_ans))
#> [1] 0
基数R的%*%
是否执行我跳过的某种数据检查?
Does base R's %*%
perform some type of data check that I'm skipping over?
在了解了发生了什么之后(谢谢!),值得注意的是,对于R的%*%
,这是最坏的情况,即矢量矩阵.例如,@ RalfStubner指出,使用矩阵向量乘法的RcppArmadillo实现比我演示的朴素实现还要快,这意味着比基础R快得多,但实际上与矩阵R的基础R %*%
相同乘(当两个矩阵都大且平方时):
After understanding what's going on (thanks SO!), it's worth noting that this is a worst case scenario for R's %*%
, i.e. matrix by vector. For example, @RalfStubner pointed out that using an RcppArmadillo implementation of a matrix-vector multiply is even faster than the naive implementation that I demonstrated, implying considerable faster than base R, but is virtually identical to base R's %*%
for matrix-matrix multiply (when both matrices are large and square):
arma_code <-
"arma::mat arma_mm(const arma::mat& m, const arma::mat& m2) {
return m * m2;
};"
arma_mm = cppFunction(code = arma_code, depends = "RcppArmadillo")
nRow = 10^3
nCol = 10^3
mat1 = matrix(rnorm(nRow * nCol),
nrow = nRow)
mat2 = matrix(rnorm(nRow * nCol),
nrow = nRow)
system.time(arma_mm(mat1, mat2))
#> user system elapsed
#> 0.798 0.008 0.814
system.time(mat1 %*% mat2)
#> user system elapsed
#> 0.807 0.005 0.822
因此,R的当前(v3.5.0)%*%
对于矩阵矩阵几乎是最佳的,但是如果您可以跳过检查,则可以大大提高矩阵向量的速度.
So R's current (v3.5.0) %*%
is near optimal for matrix-matrix, but could be significantly sped up for matrix-vector if you're okay skipping the checking.
推荐答案
names.c
(此处是链接 do_matprod
的代码.
A quick glance in names.c
(here in particular) points you to do_matprod
, the C function that is called by %*%
and which is found in the file array.c
. (Interestingly, it turns out, that both crossprod
and tcrossprod
dispatch to that same function as well). Here is a link to the code of do_matprod
.
滚动浏览该功能,您会发现它可以处理许多您幼稚的实现所不具备的功能,包括:
Scrolling through the function, you can see that it takes care of a number of things your naive implementation does not, including:
- 保留行和列的名称(在有意义的地方).
- 当通过调用
%*%
操作的两个对象属于已提供此类方法的类时,允许分派给其他S4方法. (这就是这部分函数的a>.) - 处理实矩阵和复杂矩阵.
- 实施一系列规则,以处理矩阵和矩阵,向量和矩阵,矩阵和向量以及向量和向量的乘法. (回想一下,在R中进行交叉乘法时,LHS上的向量被视为行向量,而RHS上的向量被视为列向量;这就是做到这一点的代码.)
- Keeps row and column names, where that makes sense.
- Allows for dispatch to alternative S4 methods when the two objects being operated on by a call to
%*%
are of classes for which such methods have been provided. (That's what's happening in this portion of the function.) - Handles both real and complex matrices.
- Implements a series of rules for how to handle multiplication of a matrix and a matrix, a vector and a matrix, a matrix and a vector, and a vector and a vector. (Recall that under cross-multiplication in R, a vector on the LHS is treated as a row vector, whereas on the RHS, it is treated as a column vector; this is the code that makes that so.)
函数接近尾声,它将分派到 matprod
或 cmatprod
.有趣的是(至少对我而言),对于实数矩阵,如果任何一个矩阵都可能包含NaN
或Inf
值,则matprod
会分派( simple_matprod
就像您一样简单明了自己的.否则,它将分派给几个BLAS Fortran例程之一,如果可以保证统一的行为良好"的矩阵元素,则该例程可能会更快.
Near the end of the function, it dispatches to either of matprod
or or cmatprod
. Interestingly (to me at least), in the case of real matrices, if either matrix might contain NaN
or Inf
values, then matprod
dispatches (here) to a function called simple_matprod
which is about as simple and straightforward as your own. Otherwise, it dispatches to one of a couple of BLAS Fortran routines which, presumably are faster, if uniformly 'well-behaved' matrix elements can be guaranteed.
这篇关于为什么这种朴素的矩阵乘法比基数R快?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!