R:比"tcrossprod"更快的R函数.对称密集矩阵乘法? [英] R: any faster R function than "tcrossprod" for symmetric dense matrix multiplication?
问题描述
让
x = matrix(rnorm(1000000), nrow = 5000)
我想计算转置为x %*% t(x)
的矩阵乘法.
I would like to compute matrix multiplication with its transpose: x %*% t(x)
.
在谷歌搜索后,我发现执行上述操作的一种更快的方法是
After googling I found a possible faster way of doing the above is
tcrossprod(x)
花费的时间是
user system elapsed
2.975 0.000 2.960
是否有其他R功能可以比上述功能更快地完成任务?
Is there is any other R-function which can do the task faster than the above function?
推荐答案
否.在R级别,这已经是最快的了.但是在内部,它会调用3级BLAS例程dsyrk
.因此,如果您可以拥有高性能的BLAS库,则速度会快很多.尝试将 OpenBLAS 链接到您的R.
No. At R level this is already the fastest. But internally it calls level-3 BLAS routine dsyrk
. So if you can have a high performance BLAS library this will be a lot faster. Try linking OpenBLAS to your R.
链接BLAS库不需要重建R.您可能已阅读我的问题将R链接到BLAS库作为概述,其中包含几个链接,向您展示如何设置别名,然后在计算机上的不同BLAS库之间进行切换.
Linking a BLAS library does not require rebuilding R. You may have a read on my question linking R to BLAS library for an overview, which contains several links showing you how to set up alias then switch between different BLAS libraries on the machine.
或者,您可以阅读我的冗长的问题和答案如果没有root用户访问权限,请在将RLAS与参考BLAS链接时使用经过调整的BLAS运行R 提供了使用外部BLAS库的各种方法,即使R链接到参考BLAS库.
Alternatively, you can read my extremely long question and answer Without root access, run R with tuned BLAS when it is linked with reference BLAS which gives various ways to use an external BLAS library even if R is linked to reference BLAS library.
请注意,对于尺寸为m * n
的矩阵,dsyrk
的FLOP计数为n * m ^ 2
. (注意,这是tcrossprod
的计算成本.对于crossprod
,它是m * n ^ 2
.)
As a side note, for a matrix with dimension m * n
, dsyrk
has FLOP counts n * m ^ 2
. (Note, this is the computational costs for tcrossprod
. For crossprod
it is m * n ^ 2
.)
您有m = 5000
和n = 200
,计算需要2.96s
.因此,计算速度为:(200 * 5000 ^ 2 / 2.96) * 1e-9 = 1.68 GFLOPs
.好吧,这是普通的性能水平,因此,目前您肯定是在使用参考BLAS.使用OpenBLAS
,性能可以达到10 GFLOPs
或更高,具体取决于您的CPU.祝你好运!
You have m = 5000
and n = 200
, and computation takes 2.96s
. Thus, computation has speed: (200 * 5000 ^ 2 / 2.96) * 1e-9 = 1.68 GFLOPs
. Well, this is an ordinary level of performance so at the moment you are definitely using reference BLAS. With OpenBLAS
, performance can reach 10 GFLOPs
or more, depending on your CPU. Good luck!
这篇关于R:比"tcrossprod"更快的R函数.对称密集矩阵乘法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!