加速numpy.dot [英] Speeding up numpy.dot

查看:406
本文介绍了加速numpy.dot的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个numpy脚本,它在以下代码中花费了其运行时的大约50%:

I've got a numpy script that spends about 50% of its runtime in the following code:

s = numpy.dot(v1, v1)

其中

v1 = v[1:]

v是存储在连续内存(v.strides(8,))中的float64的4000个元素的一维ndarray.

and v is a 4000-element 1D ndarray of float64 stored in contiguous memory (v.strides is (8,)).

有什么建议可以加快速度吗?

Any suggestions for speeding this up?

编辑这是在Intel硬件上.这是我的numpy.show_config()的输出:

edit This is on Intel hardware. Here is the output of my numpy.show_config():

atlas_threads_info:
    libraries = ['lapack', 'ptf77blas', 'ptcblas', 'atlas']
    library_dirs = ['/usr/local/atlas-3.9.16/lib']
    language = f77
    include_dirs = ['/usr/local/atlas-3.9.16/include']

blas_opt_info:
    libraries = ['ptf77blas', 'ptcblas', 'atlas']
    library_dirs = ['/usr/local/atlas-3.9.16/lib']
    define_macros = [('ATLAS_INFO', '"\\"3.9.16\\""')]
    language = c
    include_dirs = ['/usr/local/atlas-3.9.16/include']

atlas_blas_threads_info:
    libraries = ['ptf77blas', 'ptcblas', 'atlas']
    library_dirs = ['/usr/local/atlas-3.9.16/lib']
    language = c
    include_dirs = ['/usr/local/atlas-3.9.16/include']

lapack_opt_info:
    libraries = ['lapack', 'ptf77blas', 'ptcblas', 'atlas']
    library_dirs = ['/usr/local/atlas-3.9.16/lib']
    define_macros = [('ATLAS_INFO', '"\\"3.9.16\\""')]
    language = f77
    include_dirs = ['/usr/local/atlas-3.9.16/include']

lapack_mkl_info:
  NOT AVAILABLE

blas_mkl_info:
  NOT AVAILABLE

mkl_info:
  NOT AVAILABLE

推荐答案

您的数组不是很大,因此ATLAS可能不会做很多事情.您接下来的Fortran计划的时间是什么?假设ATLAS并没有做太多事情,这应该让您了解如果没有任何python开销,dot()可能有多快.使用gfortran -O3,我的速度为5 +/- 0.5 us.

Your arrays are not very big, so ATLAS probably isn't doing much. What are your timings for the following Fortran program? Assuming ATLAS isn't doing much, this should give you a sense of how fast dot() could be if there was not any python overhead. With gfortran -O3 I get speeds of 5 +/- 0.5 us.

    program test

    real*8 :: x(4000), start, finish, s
    integer :: i, j
    integer,parameter :: jmax = 100000

    x(:) = 4.65
    s = 0.
    call cpu_time(start)
    do j=1,jmax
        s = s + dot_product(x, x)
    enddo
    call cpu_time(finish)
    print *, (finish-start)/jmax * 1.e6, s

    end program test

这篇关于加速numpy.dot的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆