加速numpy.dot [英] Speeding up numpy.dot
问题描述
我有一个numpy
脚本,它在以下代码中花费了其运行时的大约50%:
I've got a numpy
script that spends about 50% of its runtime in the following code:
s = numpy.dot(v1, v1)
其中
v1 = v[1:]
和v
是存储在连续内存(v.strides
为(8,)
)中的float64
的4000个元素的一维ndarray
.
and v
is a 4000-element 1D ndarray
of float64
stored in contiguous memory (v.strides
is (8,)
).
有什么建议可以加快速度吗?
Any suggestions for speeding this up?
编辑这是在Intel硬件上.这是我的numpy.show_config()
的输出:
edit This is on Intel hardware. Here is the output of my numpy.show_config()
:
atlas_threads_info:
libraries = ['lapack', 'ptf77blas', 'ptcblas', 'atlas']
library_dirs = ['/usr/local/atlas-3.9.16/lib']
language = f77
include_dirs = ['/usr/local/atlas-3.9.16/include']
blas_opt_info:
libraries = ['ptf77blas', 'ptcblas', 'atlas']
library_dirs = ['/usr/local/atlas-3.9.16/lib']
define_macros = [('ATLAS_INFO', '"\\"3.9.16\\""')]
language = c
include_dirs = ['/usr/local/atlas-3.9.16/include']
atlas_blas_threads_info:
libraries = ['ptf77blas', 'ptcblas', 'atlas']
library_dirs = ['/usr/local/atlas-3.9.16/lib']
language = c
include_dirs = ['/usr/local/atlas-3.9.16/include']
lapack_opt_info:
libraries = ['lapack', 'ptf77blas', 'ptcblas', 'atlas']
library_dirs = ['/usr/local/atlas-3.9.16/lib']
define_macros = [('ATLAS_INFO', '"\\"3.9.16\\""')]
language = f77
include_dirs = ['/usr/local/atlas-3.9.16/include']
lapack_mkl_info:
NOT AVAILABLE
blas_mkl_info:
NOT AVAILABLE
mkl_info:
NOT AVAILABLE
推荐答案
您的数组不是很大,因此ATLAS可能不会做很多事情.您接下来的Fortran计划的时间是什么?假设ATLAS并没有做太多事情,这应该让您了解如果没有任何python开销,dot()可能有多快.使用gfortran -O3,我的速度为5 +/- 0.5 us.
Your arrays are not very big, so ATLAS probably isn't doing much. What are your timings for the following Fortran program? Assuming ATLAS isn't doing much, this should give you a sense of how fast dot() could be if there was not any python overhead. With gfortran -O3 I get speeds of 5 +/- 0.5 us.
program test
real*8 :: x(4000), start, finish, s
integer :: i, j
integer,parameter :: jmax = 100000
x(:) = 4.65
s = 0.
call cpu_time(start)
do j=1,jmax
s = s + dot_product(x, x)
enddo
call cpu_time(finish)
print *, (finish-start)/jmax * 1.e6, s
end program test
这篇关于加速numpy.dot的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!