python/numpy中的多线程blas [英] multithreaded blas in python/numpy

查看:136
本文介绍了python/numpy中的多线程blas的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在Python中实现大量矩阵矩阵乘法.最初,我假设NumPy将自动使用我的线程BLAS库,因为我是根据这些库构建它的.但是,当我查看 顶部 或其他名称时似乎代码根本不使用线程.

I am trying to implement a large number of matrix-matrix multiplications in Python. Initially, I assumed that NumPy would use automatically my threaded BLAS libraries since I built it against those libraries. However, when I look at top or something else it seems like the code does not use threading at all.

任何想法是什么错误,或者我可以做些什么来轻松使用BLAS性能?

Any ideas what is wrong or what I can do to easily use BLAS performance?

推荐答案

并非所有的Nu​​mPy都使用BLAS,只有某些功能-特别是dot()vdot()innerproduct()以及numpy.linalg中的一些功能模块.还要注意,许多NumPy操作受大型阵列的内存带宽限制,因此优化的实现不太可能带来任何改善.如果受到内存带宽的限制,多线程能否提供更好的性能取决于您的硬件.

Not all of NumPy uses BLAS, only some functions -- specifically dot(), vdot(), and innerproduct() and several functions from the numpy.linalg module. Also note that many NumPy operations are limited by memory bandwidth for large arrays, so an optimised implementation is unlikely to give any improvement. Whether multi-threading can give better performance if you are limited by memory bandwidth heavily depends on your hardware.

这篇关于python/numpy中的多线程blas的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆