numpy中向量化函数的性能行为 [英] Performance behaviour of vectorized functions in numpy
问题描述
我想通过以下方式在python中执行数学积分:
[1]借助scipy.optimize.fsolve求解隐式方程,以找到被积数的最大位置
[2]在scipy.integrate.quad的帮助下,将被积数的最大值移至零,并从-10积分到10.
由于被整数有一个自由参数xi,因此我想在numpy的帮助下使用一系列xi值执行此操作,因此我使用numpy.vectorize.
现在有两种方法可以矢量化该算法:
[A]分别矢量化[1]和[2],并将vec_ [1]的结果提供给vec_ [2]作为输入
[B]将一次执行[1]和[2]的函数向量化
我注意到[A]比[B]快得多.这是代码:
from scipy import optimize, integrate
import numpy as np
from math import exp, sqrt, pi
import time
def integral_with_fsolve(xi):
xc = optimize.fsolve(lambda x: x+1./(1+exp(-x))-xi,0.)
def integrand(x,xi):
return exp(-(x-xi+xc)**2)/(2.*sqrt(2.*pi))/(1.+exp(x+xc))
integral = integrate.quad(integrand,-10.,10.,args=(xi,),epsabs=0.)
return integral[0]
def integral(xi,xc):
def integrand(x,xi):
return exp(-(x-xi+xc)**2)/(2.*sqrt(2.*pi))/(1.+exp(x+xc))
integral = integrate.quad(integrand,-10.,10.,args=(xi,),epsabs=0.)
return integral[0]
def fsolve(xi):
return optimize.fsolve(lambda x: x+1./(1+exp(-x))-xi,0.)
vec_integral_with_fsolve = np.vectorize(integral_with_fsolve)
vec_integral = np.vectorize(integral)
vec_fsolve = np.vectorize(fsolve)
xi = np.linspace(0.,2.,1000)
t0=time.time()
something = vec_integral_with_fsolve(xi)
dur=(time.time()-t0)
speed = xi.size/dur
print('Integrate and fsolve vectorized in one: speed = {} ints/sec'.format(speed))
t0=time.time()
xc = vec_fsolve(xi)
something = vec_integral(xi,xc)
dur=(time.time()-t0)
speed = xi.size/dur
print('Integrate and fsolve vectorized seperately: speed = {} ints/sec'.format(speed))
输出总是类似
将向量矢量化为一个积分:f = 298.151473998 ints/sec
分别积分和求解矢量化:速度= 2136.75134429 ints/sec
由于这只是我实际问题的简化版本,因此我需要知道为什么会这样.有人可以解释一下吗?谢谢!
总而言之,当您使用一次"方法时,这是xc
变量的问题.是一个ndarray
,当在math.exp()
中用x
或xi
(均为浮点数)调用时,会使代码变慢.如果您采用一次性"方法制作xc=float(xc)
,您将获得与单独"方式几乎相同的性能.
下面是有关如何发现问题的详细说明.
使用 cProfile
可以很容易地看到瓶颈是:
AT ONCE:
ncalls tottime percall cumtime percall filename:lineno(function)
1001 0.002 0.000 2.040 0.002 quadpack.py:135(quad)
1001 0.001 0.000 2.038 0.002 quadpack.py:295(_quad)
1001 0.002 0.000 2.143 0.002 tmp.py:15(integral_with_fsolve)
231231 1.776 0.000 1.925 0.000 tmp.py:17(integrand)
470780 0.118 0.000 0.118 0.000 {math.exp}
1001 0.112 0.000 2.037 0.002 {scipy.integrate._quadpack._qagse}
SEPARATELY:
ncalls tottime percall cumtime percall filename:lineno(function)
1001 0.001 0.000 0.340 0.000 quadpack.py:135(quad)
1001 0.001 0.000 0.339 0.000 quadpack.py:295(_quad)
1001 0.001 0.000 0.341 0.000 tmp.py:9(integral)
231231 0.200 0.000 0.278 0.000 tmp.py:10(integrand)
470780 0.054 0.000 0.054 0.000 {math.exp}
1001 0.060 0.000 0.338 0.000 {scipy.integrate._quadpack._qagse}
总体时间是:
AT ONCE:
1 loops, best of 3: 1.91 s per loop
SEPARATELY:
1 loops, best of 3: 312 ms per loop
最大的区别在于integrand
,在integral_with_fsolve
中运行几乎需要7倍的时间.数值积分quad
也是如此.甚至math.exp
在单独"方法中也快两倍.
这表明每种方法中要评估的类型是不同的.实际上,这就是重点.当一次"运行时,您可以打印type(xc)
来查看它是numpy.ndarray, float64
,而在单独"方式中它只是一个float64
.将这些类型加到math.exp()
内似乎不是一个好主意,如下所示:
xa = -0.389760856858
xc = np.array([[-0.389760856858]],dtype='float64')
timeit for i in range(1000000): exp(xc+xa)
#1 loops, best of 3: 1.96 s per loop
timeit for i in range(1000000): exp(xa+xa)
#10 loops, best of 3: 173 ms per loop
在两种情况下,math.exp()
均返回float
.使用numpy
中的exp
,sqrt
和pi
可以减少差异,但是会使您的代码慢得多,可能是因为这些函数也可能返回ndarray
:
AT ONCE:
1 loops, best of 3: 4.46 s per loop
SEPARATELY:
1 loops, best of 3: 2.14 s per loop
在这种情况下,最好不要转换为ndarray
.好的方法是将其转换为float
,如下所示(在一次性"方法中这是必需的):
def integral_with_fsolve(xi):
xc = optimize.fsolve(lambda x: x+1./(1+exp(-x))-xi,0.)
xc = float(xc) # <-- SEE HERE
def integrand(x,xi):
return exp(-(x-xi+xc)**2)/(2.*sqrt(2.*pi))/(1.+exp(x+xc))
integral = integrate.quad(integrand,-10.,10.,args=(xi,),epsabs=0.)
return integral[0]
使用新的时间:
AT ONCE:
1 loops, best of 3: 321 ms per loop
SEPARATELY:
1 loops, best of 3: 315 ms per loop
I want to perform a mathematical integration in python in the following way:
[1] Solve an implicit equation with the help of scipy.optimize.fsolve to find the maximum position of the integrand
[2] Shift the integrand's maximum to zero and integrate from -10 to 10 with the help of scipy.integrate.quad
Since the integrand has one free parameter xi, I want to perform this with a range of xi values with the help of numpy, so I use numpy.vectorize.
Now there are two ways of vectorizing this alghorithm:
[A] vectorize [1] and [2] separately and give the result of vec_[1] to vec_[2] as input
[B] vectorize a function that performs [1] and [2] at once
I noticed that [A] is much faster than [B]. Here is the code:
from scipy import optimize, integrate
import numpy as np
from math import exp, sqrt, pi
import time
def integral_with_fsolve(xi):
xc = optimize.fsolve(lambda x: x+1./(1+exp(-x))-xi,0.)
def integrand(x,xi):
return exp(-(x-xi+xc)**2)/(2.*sqrt(2.*pi))/(1.+exp(x+xc))
integral = integrate.quad(integrand,-10.,10.,args=(xi,),epsabs=0.)
return integral[0]
def integral(xi,xc):
def integrand(x,xi):
return exp(-(x-xi+xc)**2)/(2.*sqrt(2.*pi))/(1.+exp(x+xc))
integral = integrate.quad(integrand,-10.,10.,args=(xi,),epsabs=0.)
return integral[0]
def fsolve(xi):
return optimize.fsolve(lambda x: x+1./(1+exp(-x))-xi,0.)
vec_integral_with_fsolve = np.vectorize(integral_with_fsolve)
vec_integral = np.vectorize(integral)
vec_fsolve = np.vectorize(fsolve)
xi = np.linspace(0.,2.,1000)
t0=time.time()
something = vec_integral_with_fsolve(xi)
dur=(time.time()-t0)
speed = xi.size/dur
print('Integrate and fsolve vectorized in one: speed = {} ints/sec'.format(speed))
t0=time.time()
xc = vec_fsolve(xi)
something = vec_integral(xi,xc)
dur=(time.time()-t0)
speed = xi.size/dur
print('Integrate and fsolve vectorized seperately: speed = {} ints/sec'.format(speed))
and the output is always something like
Integrate and fsolve vectorized in one: speed = 298.151473998 ints/sec
Integrate and fsolve vectorized seperately: speed = 2136.75134429 ints/sec
Since this is only a simplified version of my actual problem, I need to know why it behaves like this. Could someone please explain? Thanks!
In summary, this is a problem with the xc
variable when you use the "at once" approach. It is a ndarray
that, when called in math.exp()
with x
or xi
(both floats) turns the code to be much slower. If you make xc=float(xc)
in the "at once" approach you will get practically the same performance as in the "separately" appraoch.
Below it follows a detailed description about how to find that out.
Using cProfile
it is easy to see where the bottlenecks are:
AT ONCE:
ncalls tottime percall cumtime percall filename:lineno(function)
1001 0.002 0.000 2.040 0.002 quadpack.py:135(quad)
1001 0.001 0.000 2.038 0.002 quadpack.py:295(_quad)
1001 0.002 0.000 2.143 0.002 tmp.py:15(integral_with_fsolve)
231231 1.776 0.000 1.925 0.000 tmp.py:17(integrand)
470780 0.118 0.000 0.118 0.000 {math.exp}
1001 0.112 0.000 2.037 0.002 {scipy.integrate._quadpack._qagse}
SEPARATELY:
ncalls tottime percall cumtime percall filename:lineno(function)
1001 0.001 0.000 0.340 0.000 quadpack.py:135(quad)
1001 0.001 0.000 0.339 0.000 quadpack.py:295(_quad)
1001 0.001 0.000 0.341 0.000 tmp.py:9(integral)
231231 0.200 0.000 0.278 0.000 tmp.py:10(integrand)
470780 0.054 0.000 0.054 0.000 {math.exp}
1001 0.060 0.000 0.338 0.000 {scipy.integrate._quadpack._qagse}
The overall timing is:
AT ONCE:
1 loops, best of 3: 1.91 s per loop
SEPARATELY:
1 loops, best of 3: 312 ms per loop
The biggest difference is in integrand
, which takes almost 7x longer to run in integral_with_fsolve
. And the same happens to the numerical integration quad
. Even math.exp
is twice as fast in the "separately" approach.
It suggests that the types being evaluated in each approach is different. In fact that is the point. When running "at once" you can print type(xc)
to see it is a numpy.ndarray, float64
, while in the "separately" approach it is just a float64
. It seems not to be a good idea to sum these types inside a math.exp()
, as shown below:
xa = -0.389760856858
xc = np.array([[-0.389760856858]],dtype='float64')
timeit for i in range(1000000): exp(xc+xa)
#1 loops, best of 3: 1.96 s per loop
timeit for i in range(1000000): exp(xa+xa)
#10 loops, best of 3: 173 ms per loop
In both cases math.exp()
returns a float
. Using exp
, sqrt
and pi
from numpy
reduces the difference but it makes your code much slower, probably because these functions may also return a ndarray
:
AT ONCE:
1 loops, best of 3: 4.46 s per loop
SEPARATELY:
1 loops, best of 3: 2.14 s per loop
To it seems a good idea NOT to convert to a ndarray
in this case. The good idea is to convert to float
then, as shown below (just necessary in the "at once" approach):
def integral_with_fsolve(xi):
xc = optimize.fsolve(lambda x: x+1./(1+exp(-x))-xi,0.)
xc = float(xc) # <-- SEE HERE
def integrand(x,xi):
return exp(-(x-xi+xc)**2)/(2.*sqrt(2.*pi))/(1.+exp(x+xc))
integral = integrate.quad(integrand,-10.,10.,args=(xi,),epsabs=0.)
return integral[0]
with the new timing:
AT ONCE:
1 loops, best of 3: 321 ms per loop
SEPARATELY:
1 loops, best of 3: 315 ms per loop
这篇关于numpy中向量化函数的性能行为的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!