使用OpenBLAS集成编译numpy [英] Compiling numpy with OpenBLAS integration
问题描述
我正在尝试将numpy
与OpenBLAS
一起安装,但是我对如何编写site.cfg
文件感到困惑.
I am trying to install numpy
with OpenBLAS
, however I am at loss as to how the site.cfg
file needs to be written.
按照安装过程,安装成功完成,没有错误,但是,OpenBLAS使用的线程数从1增加(由环境变量OMP_NUM_THREADS控制)会降低性能.
When the installation procedure was followed the installation completed without errors, however there is performance degradation on increasing the number of threads used by OpenBLAS from 1 (controlled by the environment variable OMP_NUM_THREADS).
我不确定OpenBLAS集成是否完美.任何人都可以提供一个site.cfg
文件来实现同样的目的.
I am not sure if the OpenBLAS integration has been perfect. Could any one provide a site.cfg
file to achieve the same.
PS:基于Python的其他工具包(如 Theano )中的OpenBLAS集成在增加数量上提供了可观的性能提升同一台计算机上的线程数.
P.S.: OpenBLAS integration in other toolkits like Theano, which is based on Python, provides substantial performance boost on increasing the number of threads, on the same machine.
推荐答案
我刚刚在virtualenv
中使用OpenBLAS
集成编译了numpy
,它似乎可以正常工作.
I just compiled numpy
inside a virtualenv
with OpenBLAS
integration, and it seems to be working OK.
这是我的过程:
-
编译
OpenBLAS
:
$ git clone https://github.com/xianyi/OpenBLAS
$ cd OpenBLAS && make FC=gfortran
$ sudo make PREFIX=/opt/OpenBLAS install
如果您没有管理员权限,则可以将PREFIX=
设置为具有写权限的目录(只需相应地修改下面的相应步骤).
If you don't have admin rights you could set PREFIX=
to a directory where you have write privileges (just modify the corresponding steps below accordingly).
确保包含libopenblas.so
的目录在共享库搜索路径中.
Make sure that the directory containing libopenblas.so
is in your shared library search path.
-
要在本地执行此操作,可以编辑
~/.bashrc
文件以包含该行
export LD_LIBRARY_PATH=/opt/OpenBLAS/lib:$LD_LIBRARY_PATH
LD_LIBRARY_PATH
环境变量将在您启动新的终端会话时更新(使用$ source ~/.bashrc
强制在同一会话中进行更新).
The LD_LIBRARY_PATH
environment variable will be updated when you start a new terminal session (use $ source ~/.bashrc
to force an update within the same session).
另一个适用于多个用户的选项是在/etc/ld.so.conf.d/
中创建一个包含行/opt/OpenBLAS/lib
的.conf
文件,例如:
Another option that will work for multiple users is to create a .conf
file in /etc/ld.so.conf.d/
containing the line /opt/OpenBLAS/lib
, e.g.:
$ sudo sh -c "echo '/opt/OpenBLAS/lib' > /etc/ld.so.conf.d/openblas.conf"
完成任一选项后,运行
$ sudo ldconfig
获取numpy
源代码:
Grab the numpy
source code:
$ git clone https://github.com/numpy/numpy
$ cd numpy
将site.cfg.example
复制到site.cfg
并编辑副本:
Copy site.cfg.example
to site.cfg
and edit the copy:
$ cp site.cfg.example site.cfg
$ nano site.cfg
取消注释这些行:
....
[openblas]
libraries = openblas
library_dirs = /opt/OpenBLAS/lib
include_dirs = /opt/OpenBLAS/include
....
检查配置,构建和安装(可选在virtualenv
内部)
$ python setup.py config
输出应如下所示:
...
openblas_info:
FOUND:
libraries = ['openblas', 'openblas']
library_dirs = ['/opt/OpenBLAS/lib']
language = c
define_macros = [('HAVE_CBLAS', None)]
FOUND:
libraries = ['openblas', 'openblas']
library_dirs = ['/opt/OpenBLAS/lib']
language = c
define_macros = [('HAVE_CBLAS', None)]
...
使用pip
安装对于python setup.py install
来说首选,因为pip
会跟踪软件包元数据,并允许您将来轻松卸载或升级numpy.
Installing with pip
is preferable to using python setup.py install
, since pip
will keep track of the package metadata and allow you to easily uninstall or upgrade numpy in the future.
$ pip install .
可选:您可以使用此脚本来测试不同线程数的性能.
Optional: you can use this script to test performance for different thread counts.
$ OMP_NUM_THREADS=1 python build/test_numpy.py
version: 1.10.0.dev0+8e026a2
maxint: 9223372036854775807
BLAS info:
* libraries ['openblas', 'openblas']
* library_dirs ['/opt/OpenBLAS/lib']
* define_macros [('HAVE_CBLAS', None)]
* language c
dot: 0.099796795845 sec
$ OMP_NUM_THREADS=8 python build/test_numpy.py
version: 1.10.0.dev0+8e026a2
maxint: 9223372036854775807
BLAS info:
* libraries ['openblas', 'openblas']
* library_dirs ['/opt/OpenBLAS/lib']
* define_macros [('HAVE_CBLAS', None)]
* language c
dot: 0.0439578056335 sec
对于更高的线程数,性能似乎有了明显的提高.但是,我尚未对此进行非常系统的测试,对于较小的矩阵,可能会有更多的开销超过线程数增加带来的性能好处.
There seems to be a noticeable improvement in performance for higher thread counts. However, I haven't tested this very systematically, and it's likely that for smaller matrices the additional overhead would outweigh the performance benefit from a higher thread count.
这篇关于使用OpenBLAS集成编译numpy的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!