R:比"tcrossprod"更快的R函数.对称密集矩阵乘法? [英] R: any faster R function than "tcrossprod" for symmetric dense matrix multiplication?

查看:239
本文介绍了R:比"tcrossprod"更快的R函数.对称密集矩阵乘法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

x = matrix(rnorm(1000000), nrow = 5000)

我想计算转置为x %*% t(x)的矩阵乘法.

I would like to compute matrix multiplication with its transpose: x %*% t(x).

在谷歌搜索后,我发现执行上述操作的一种更快的方法是

After googling I found a possible faster way of doing the above is

tcrossprod(x)

花费的时间是

 user  system elapsed 
2.975   0.000   2.960

是否有其他R功能可以比上述功能更快地完成任务?

Is there is any other R-function which can do the task faster than the above function?

推荐答案

否.在R级别,这已经是最快的了.但是在内部,它会调用3级BLAS例程dsyrk.因此,如果您可以拥有高性能的BLAS库,则速度会快很多.尝试将 OpenBLAS 链接到您的R.

No. At R level this is already the fastest. But internally it calls level-3 BLAS routine dsyrk. So if you can have a high performance BLAS library this will be a lot faster. Try linking OpenBLAS to your R.

链接BLAS库不需要重建R.您可能已阅读我的问题将R链接到BLAS库作为概述,其中包含几个链接,向您展示如何设置别名,然后在计算机上的不同BLAS库之间进行切换.

Linking a BLAS library does not require rebuilding R. You may have a read on my question linking R to BLAS library for an overview, which contains several links showing you how to set up alias then switch between different BLAS libraries on the machine.

或者,您可以阅读我的冗长的问题和答案如果没有root用户访问权限,请在将RLAS与参考BLAS链接时使用经过调整的BLAS运行R 提供了使用外部BLAS库的各种方法,即使R链接到参考BLAS库.

Alternatively, you can read my extremely long question and answer Without root access, run R with tuned BLAS when it is linked with reference BLAS which gives various ways to use an external BLAS library even if R is linked to reference BLAS library.

请注意,对于尺寸为m * n的矩阵,dsyrk的FLOP计数为n * m ^ 2. (注意,这是tcrossprod的计算成本.对于crossprod,它是m * n ^ 2.)

As a side note, for a matrix with dimension m * n, dsyrk has FLOP counts n * m ^ 2. (Note, this is the computational costs for tcrossprod. For crossprod it is m * n ^ 2.)

您有m = 5000n = 200,计算需要2.96s.因此,计算速度为:(200 * 5000 ^ 2 / 2.96) * 1e-9 = 1.68 GFLOPs.好吧,这是普通的性能水平,因此,目前您肯定是在使用参考BLAS.使用OpenBLAS,性能可以达到10 GFLOPs或更高,具体取决于您的CPU.祝你好运!

You have m = 5000 and n = 200, and computation takes 2.96s. Thus, computation has speed: (200 * 5000 ^ 2 / 2.96) * 1e-9 = 1.68 GFLOPs. Well, this is an ordinary level of performance so at the moment you are definitely using reference BLAS. With OpenBLAS, performance can reach 10 GFLOPs or more, depending on your CPU. Good luck!

这篇关于R:比"tcrossprod"更快的R函数.对称密集矩阵乘法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆