是否可以使用BLAS加快稀疏矩阵乘法? [英] Is it possible to use BLAS to speed up sparse matrix multiplication?

查看:450
本文介绍了是否可以使用BLAS加快稀疏矩阵乘法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在尝试加快大型稀疏(scipy)矩阵乘法.我已经成功地将numpy安装程序与OpenBLAS链接了,从此以后,也可以使用scipy.我已经成功运行了这些测试.

I am currently trying to speed up my large sparse (scipy) matrix multiplications. I have successfully linked my numpy installation with OpenBLAS and henceforth, also scipy. I have run these tests with success.

当我使用numpy.dot(X,Y)时,我可以清楚地看到性能提升,并且同时使用多个内核.但是,当我使用scipy的点功能时,看不到这种性能提升,仍然使用了一个核心.例如:

When I use numpy.dot(X,Y) I can clearly see performance boosts and also that multiple cores are used simultaneously. However, when I use scipy's dot functionality, no such performance boosts can be seen and still one one core is used. For example:

x = scipy.sparse.csr_matrix(numpy.random.random((1000,1000)))
x.dot(x.T)

有人知道我如何使BLAS也能与scipy的点功能一起使用吗?

Does anyone know how I can make BLAS also work with scipy's dot functionality?

推荐答案

BLAS仅用于密集的浮点矩阵. scipy.sparse.csr_matrix的矩阵乘法是使用纯C ++函数完成的,该函数不会对外部BLAS库进行任何调用.

BLAS is just used for dense floating-point matrices. Matrix multiplication of a scipy.sparse.csr_matrix is done using pure C++ functions that don't make any calls to external BLAS libraries.

例如,矩阵乘法是在csr_matmat_pass_1csr_matmat_pass_2中实现.

For example, matrix-matrix multiplication is implemented here, in csr_matmat_pass_1 and csr_matmat_pass_2.

经过优化的BLAS库通过将密集的输入矩阵分解为较小的块矩阵来有效地利用CPU缓存,从而获得更好的参考位置.我的理解是,这种策略不能轻易应用于稀疏矩阵,因为非零元素可以在矩阵内任意分布.

Optimised BLAS libraries are highly tuned to make efficient use of CPU caches by decomposing the dense input matrices into smaller block matrices in order to achieve better locality-of-reference. My understanding is that this strategy can't be easily applied to sparse matrices, where the non-zero elements may be arbitrarily distributed within the matrix.

这篇关于是否可以使用BLAS加快稀疏矩阵乘法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆