多核硬件上的numpy [英] numpy on multicore hardware

查看:189
本文介绍了多核硬件上的numpy的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在内部硬件和外部矢量乘积,矢量矩阵乘法等方面,要使numpy使用多核(在Intel硬件上)是最新的技术?

如有必要,我很乐意重建numpy,但目前我正在寻找在不更改代码的情况下加快处理速度的方法.

作为参考,我的show_config()如下,并且我从未观察过numpy使用多个内核:

atlas_threads_info:
    libraries = ['lapack', 'ptf77blas', 'ptcblas', 'atlas']
    library_dirs = ['/usr/local/atlas-3.9.16/lib']
    language = f77
    include_dirs = ['/usr/local/atlas-3.9.16/include']

blas_opt_info:
    libraries = ['ptf77blas', 'ptcblas', 'atlas']
    library_dirs = ['/usr/local/atlas-3.9.16/lib']
    define_macros = [('ATLAS_INFO', '"\\"3.9.16\\""')]
    language = c
    include_dirs = ['/usr/local/atlas-3.9.16/include']

atlas_blas_threads_info:
    libraries = ['ptf77blas', 'ptcblas', 'atlas']
    library_dirs = ['/usr/local/atlas-3.9.16/lib']
    language = c
    include_dirs = ['/usr/local/atlas-3.9.16/include']

lapack_opt_info:
    libraries = ['lapack', 'ptf77blas', 'ptcblas', 'atlas']
    library_dirs = ['/usr/local/atlas-3.9.16/lib']
    define_macros = [('ATLAS_INFO', '"\\"3.9.16\\""')]
    language = f77
    include_dirs = ['/usr/local/atlas-3.9.16/include']

lapack_mkl_info:
  NOT AVAILABLE

blas_mkl_info:
  NOT AVAILABLE

mkl_info:
  NOT AVAILABLE

解决方案

您可能应该首先检查numpy使用的Atlas构建是否已使用多线程构建.您可以构建并运行此程序以检查Atlas配置(直接来自Atlas常见问题解答):

main()
/*
 * Compile, link and run with something like:
 *    gcc -o xprint_buildinfo -L[ATLAS lib dir] -latlas ; ./xprint_buildinfo
 * if link fails, you are using ATLAS version older than 3.3.6.
 */
{
   void ATL_buildinfo(void);
   ATL_buildinfo();
   exit(0);
}

如果您没有Atlas的多线程版本:有您的问题".如果它是多线程的,则需要使用适当的大型矩阵矩阵乘积执行多线程BLAS3例程之一(可能是dgemm),并查看是否使用了线程.我想说的很对,阿特拉斯(Atlas)中的BLAS 2和BLAS 1例程都不支持多线程(并且有充分的理由,因为只有在真正巨大的问题规模下,它才没有性能优势).

What's the state of the art with regards to getting numpy to use mutliple cores (on Intel hardware) for things like inner and outer vector products, vector-matrix multiplications etc?

I am happy to rebuild numpy if necessary, but at this point I am looking at ways to speed things up without changing my code.

For reference, my show_config() is as follows, and I've never observed numpy to use more than one core:

atlas_threads_info:
    libraries = ['lapack', 'ptf77blas', 'ptcblas', 'atlas']
    library_dirs = ['/usr/local/atlas-3.9.16/lib']
    language = f77
    include_dirs = ['/usr/local/atlas-3.9.16/include']

blas_opt_info:
    libraries = ['ptf77blas', 'ptcblas', 'atlas']
    library_dirs = ['/usr/local/atlas-3.9.16/lib']
    define_macros = [('ATLAS_INFO', '"\\"3.9.16\\""')]
    language = c
    include_dirs = ['/usr/local/atlas-3.9.16/include']

atlas_blas_threads_info:
    libraries = ['ptf77blas', 'ptcblas', 'atlas']
    library_dirs = ['/usr/local/atlas-3.9.16/lib']
    language = c
    include_dirs = ['/usr/local/atlas-3.9.16/include']

lapack_opt_info:
    libraries = ['lapack', 'ptf77blas', 'ptcblas', 'atlas']
    library_dirs = ['/usr/local/atlas-3.9.16/lib']
    define_macros = [('ATLAS_INFO', '"\\"3.9.16\\""')]
    language = f77
    include_dirs = ['/usr/local/atlas-3.9.16/include']

lapack_mkl_info:
  NOT AVAILABLE

blas_mkl_info:
  NOT AVAILABLE

mkl_info:
  NOT AVAILABLE

解决方案

You should probably start by checking whether the Atlas build that numpy is using has been built with multi-threading. You can build and run this to inspect the Atlas configuration (straight from the Atlas FAQ):

main()
/*
 * Compile, link and run with something like:
 *    gcc -o xprint_buildinfo -L[ATLAS lib dir] -latlas ; ./xprint_buildinfo
 * if link fails, you are using ATLAS version older than 3.3.6.
 */
{
   void ATL_buildinfo(void);
   ATL_buildinfo();
   exit(0);
}

If you have don't have a multithreaded version of Atlas: "there's your problem". If it is multithreaded, then you need to exercise one of the multithreaded BLAS3 routines (probably dgemm), with a suitably large matrix-matrix product and see whether threading is used. I think I am right in saying that neither BLAS 2 and BLAS 1 routines in Atlas support multithreading (and with good reason because there is no performance advantage except at truly enormous problem sizes).

这篇关于多核硬件上的numpy的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆