python/numpy中的多线程blas [英] multithreaded blas in python/numpy
问题描述
我正在尝试在Python中实现大量矩阵矩阵乘法.最初,我假设NumPy将自动使用我的线程BLAS库,因为我是根据这些库构建它的.但是,当我查看 顶部 或其他名称时似乎代码根本不使用线程.
I am trying to implement a large number of matrix-matrix multiplications in Python. Initially, I assumed that NumPy would use automatically my threaded BLAS libraries since I built it against those libraries. However, when I look at top or something else it seems like the code does not use threading at all.
任何想法是什么错误,或者我可以做些什么来轻松使用BLAS性能?
Any ideas what is wrong or what I can do to easily use BLAS performance?
推荐答案
并非所有的NumPy都使用BLAS,只有某些功能-特别是dot()
,vdot()
和innerproduct()
以及numpy.linalg
中的一些功能模块.还要注意,许多NumPy操作受大型阵列的内存带宽限制,因此优化的实现不太可能带来任何改善.如果受到内存带宽的限制,多线程能否提供更好的性能取决于您的硬件.
Not all of NumPy uses BLAS, only some functions -- specifically dot()
, vdot()
, and innerproduct()
and several functions from the numpy.linalg
module. Also note that many NumPy operations are limited by memory bandwidth for large arrays, so an optimised implementation is unlikely to give any improvement. Whether multi-threading can give better performance if you are limited by memory bandwidth heavily depends on your hardware.
这篇关于python/numpy中的多线程blas的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!