numpy中向量化函数的性能行为 [英] Performance behaviour of vectorized functions in numpy

查看:66
本文介绍了numpy中向量化函数的性能行为的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想通过以下方式在python中执行数学积分:

[1]借助scipy.optimize.fsolve求解隐式方程,以找到被积数的最大位置

[2]在scipy.integrate.quad的帮助下,将被积数的最大值移至零,并从-10积分到10.

由于被整数有一个自由参数xi,因此我想在numpy的帮助下使用一系列xi值执行此操作,因此我使用numpy.vectorize.

现在有两种方法可以矢量化该算法:

[A]分别矢量化[1]和[2],并将vec_ [1]的结果提供给vec_ [2]作为输入

[B]将一次执行[1]和[2]的函数向量化

我注意到[A]比[B]快得多.这是代码:

from scipy import optimize, integrate
import numpy as np
from math import exp, sqrt, pi
import time

def integral_with_fsolve(xi):
    xc = optimize.fsolve(lambda x: x+1./(1+exp(-x))-xi,0.)
    def integrand(x,xi):
        return exp(-(x-xi+xc)**2)/(2.*sqrt(2.*pi))/(1.+exp(x+xc))
    integral = integrate.quad(integrand,-10.,10.,args=(xi,),epsabs=0.)
    return integral[0]

def integral(xi,xc):
    def integrand(x,xi):
        return exp(-(x-xi+xc)**2)/(2.*sqrt(2.*pi))/(1.+exp(x+xc))
    integral = integrate.quad(integrand,-10.,10.,args=(xi,),epsabs=0.)
    return integral[0]

def fsolve(xi):
    return optimize.fsolve(lambda x: x+1./(1+exp(-x))-xi,0.)

vec_integral_with_fsolve = np.vectorize(integral_with_fsolve)
vec_integral = np.vectorize(integral)
vec_fsolve = np.vectorize(fsolve)

xi = np.linspace(0.,2.,1000)

t0=time.time()
something = vec_integral_with_fsolve(xi)
dur=(time.time()-t0)
speed = xi.size/dur
print('Integrate and fsolve vectorized in one: speed = {} ints/sec'.format(speed))

t0=time.time()
xc = vec_fsolve(xi)
something = vec_integral(xi,xc)
dur=(time.time()-t0)
speed = xi.size/dur
print('Integrate and fsolve vectorized seperately: speed = {} ints/sec'.format(speed))

输出总是类似

将向量矢量化为一个积分:f = 298.151473998 ints/sec

分别积分和求解矢量化:速度= 2136.75134429 ints/sec

由于这只是我实际问题的简化版本,因此我需要知道为什么会这样.有人可以解释一下吗?谢谢!

解决方案

总而言之,当您使用一次"方法时,这是xc变量的问题.是一个ndarray,当在math.exp()中用xxi(均为浮点数)调用时,会使代码变慢.如果您采用一次性"方法制作xc=float(xc),您将获得与单独"方式几乎相同的性能.

下面是有关如何发现问题的详细说明.

使用 cProfile 可以很容易地看到瓶颈是:

AT ONCE:
ncalls  tottime  percall  cumtime  percall filename:lineno(function)
 1001    0.002    0.000    2.040    0.002 quadpack.py:135(quad)
 1001    0.001    0.000    2.038    0.002 quadpack.py:295(_quad)
 1001    0.002    0.000    2.143    0.002 tmp.py:15(integral_with_fsolve)
231231   1.776    0.000    1.925    0.000 tmp.py:17(integrand)
470780   0.118    0.000    0.118    0.000 {math.exp}
 1001    0.112    0.000    2.037    0.002 {scipy.integrate._quadpack._qagse}

SEPARATELY:
ncalls  tottime  percall  cumtime  percall filename:lineno(function)
 1001    0.001    0.000    0.340    0.000 quadpack.py:135(quad)
 1001    0.001    0.000    0.339    0.000 quadpack.py:295(_quad)
 1001    0.001    0.000    0.341    0.000 tmp.py:9(integral)
231231   0.200    0.000    0.278    0.000 tmp.py:10(integrand)
470780   0.054    0.000    0.054    0.000 {math.exp}
 1001    0.060    0.000    0.338    0.000 {scipy.integrate._quadpack._qagse}

总体时间是:

AT ONCE:
1 loops, best of 3: 1.91 s per loop
SEPARATELY:
1 loops, best of 3: 312 ms per loop

最大的区别在于integrand,在integral_with_fsolve中运行几乎需要7倍的时间.数值积分quad也是如此.甚至math.exp在单独"方法中也快两倍.

这表明每种方法中要评估的类型是不同的.实际上,这就是重点.当一次"运行时,您可以打印type(xc)来查看它是numpy.ndarray, float64,而在单独"方式中它只是一个float64.将这些类型加到math.exp()内似乎不是一个好主意,如下所示:

xa = -0.389760856858
xc = np.array([[-0.389760856858]],dtype='float64')

timeit for i in range(1000000): exp(xc+xa)
#1 loops, best of 3: 1.96 s per loop

timeit for i in range(1000000): exp(xa+xa)
#10 loops, best of 3: 173 ms per loop

在两种情况下,math.exp()均返回float.使用numpy中的expsqrtpi可以减少差异,但是会使您的代码慢得多,可能是因为这些函数也可能返回ndarray:

AT ONCE:
1 loops, best of 3: 4.46 s per loop
SEPARATELY:
1 loops, best of 3: 2.14 s per loop

在这种情况下,最好不要转换为ndarray.好的方法是将其转换为float,如下所示(在一次性"方法中这是必需的):

def integral_with_fsolve(xi):
    xc = optimize.fsolve(lambda x: x+1./(1+exp(-x))-xi,0.)
    xc = float(xc) # <-- SEE HERE
    def integrand(x,xi):
        return exp(-(x-xi+xc)**2)/(2.*sqrt(2.*pi))/(1.+exp(x+xc))
    integral = integrate.quad(integrand,-10.,10.,args=(xi,),epsabs=0.)
    return integral[0]

使用新的时间:

AT ONCE:
1 loops, best of 3: 321 ms per loop
SEPARATELY:
1 loops, best of 3: 315 ms per loop

I want to perform a mathematical integration in python in the following way:

[1] Solve an implicit equation with the help of scipy.optimize.fsolve to find the maximum position of the integrand

[2] Shift the integrand's maximum to zero and integrate from -10 to 10 with the help of scipy.integrate.quad

Since the integrand has one free parameter xi, I want to perform this with a range of xi values with the help of numpy, so I use numpy.vectorize.

Now there are two ways of vectorizing this alghorithm:

[A] vectorize [1] and [2] separately and give the result of vec_[1] to vec_[2] as input

[B] vectorize a function that performs [1] and [2] at once

I noticed that [A] is much faster than [B]. Here is the code:

from scipy import optimize, integrate
import numpy as np
from math import exp, sqrt, pi
import time

def integral_with_fsolve(xi):
    xc = optimize.fsolve(lambda x: x+1./(1+exp(-x))-xi,0.)
    def integrand(x,xi):
        return exp(-(x-xi+xc)**2)/(2.*sqrt(2.*pi))/(1.+exp(x+xc))
    integral = integrate.quad(integrand,-10.,10.,args=(xi,),epsabs=0.)
    return integral[0]

def integral(xi,xc):
    def integrand(x,xi):
        return exp(-(x-xi+xc)**2)/(2.*sqrt(2.*pi))/(1.+exp(x+xc))
    integral = integrate.quad(integrand,-10.,10.,args=(xi,),epsabs=0.)
    return integral[0]

def fsolve(xi):
    return optimize.fsolve(lambda x: x+1./(1+exp(-x))-xi,0.)

vec_integral_with_fsolve = np.vectorize(integral_with_fsolve)
vec_integral = np.vectorize(integral)
vec_fsolve = np.vectorize(fsolve)

xi = np.linspace(0.,2.,1000)

t0=time.time()
something = vec_integral_with_fsolve(xi)
dur=(time.time()-t0)
speed = xi.size/dur
print('Integrate and fsolve vectorized in one: speed = {} ints/sec'.format(speed))

t0=time.time()
xc = vec_fsolve(xi)
something = vec_integral(xi,xc)
dur=(time.time()-t0)
speed = xi.size/dur
print('Integrate and fsolve vectorized seperately: speed = {} ints/sec'.format(speed))

and the output is always something like

Integrate and fsolve vectorized in one: speed = 298.151473998 ints/sec

Integrate and fsolve vectorized seperately: speed = 2136.75134429 ints/sec

Since this is only a simplified version of my actual problem, I need to know why it behaves like this. Could someone please explain? Thanks!

解决方案

In summary, this is a problem with the xc variable when you use the "at once" approach. It is a ndarray that, when called in math.exp() with xor xi (both floats) turns the code to be much slower. If you make xc=float(xc) in the "at once" approach you will get practically the same performance as in the "separately" appraoch.

Below it follows a detailed description about how to find that out.

Using cProfile it is easy to see where the bottlenecks are:

AT ONCE:
ncalls  tottime  percall  cumtime  percall filename:lineno(function)
 1001    0.002    0.000    2.040    0.002 quadpack.py:135(quad)
 1001    0.001    0.000    2.038    0.002 quadpack.py:295(_quad)
 1001    0.002    0.000    2.143    0.002 tmp.py:15(integral_with_fsolve)
231231   1.776    0.000    1.925    0.000 tmp.py:17(integrand)
470780   0.118    0.000    0.118    0.000 {math.exp}
 1001    0.112    0.000    2.037    0.002 {scipy.integrate._quadpack._qagse}

SEPARATELY:
ncalls  tottime  percall  cumtime  percall filename:lineno(function)
 1001    0.001    0.000    0.340    0.000 quadpack.py:135(quad)
 1001    0.001    0.000    0.339    0.000 quadpack.py:295(_quad)
 1001    0.001    0.000    0.341    0.000 tmp.py:9(integral)
231231   0.200    0.000    0.278    0.000 tmp.py:10(integrand)
470780   0.054    0.000    0.054    0.000 {math.exp}
 1001    0.060    0.000    0.338    0.000 {scipy.integrate._quadpack._qagse}

The overall timing is:

AT ONCE:
1 loops, best of 3: 1.91 s per loop
SEPARATELY:
1 loops, best of 3: 312 ms per loop

The biggest difference is in integrand, which takes almost 7x longer to run in integral_with_fsolve. And the same happens to the numerical integration quad. Even math.exp is twice as fast in the "separately" approach.

It suggests that the types being evaluated in each approach is different. In fact that is the point. When running "at once" you can print type(xc) to see it is a numpy.ndarray, float64, while in the "separately" approach it is just a float64. It seems not to be a good idea to sum these types inside a math.exp(), as shown below:

xa = -0.389760856858
xc = np.array([[-0.389760856858]],dtype='float64')

timeit for i in range(1000000): exp(xc+xa)
#1 loops, best of 3: 1.96 s per loop

timeit for i in range(1000000): exp(xa+xa)
#10 loops, best of 3: 173 ms per loop

In both cases math.exp() returns a float. Using exp, sqrt and pi from numpy reduces the difference but it makes your code much slower, probably because these functions may also return a ndarray:

AT ONCE:
1 loops, best of 3: 4.46 s per loop
SEPARATELY:
1 loops, best of 3: 2.14 s per loop

To it seems a good idea NOT to convert to a ndarray in this case. The good idea is to convert to float then, as shown below (just necessary in the "at once" approach):

def integral_with_fsolve(xi):
    xc = optimize.fsolve(lambda x: x+1./(1+exp(-x))-xi,0.)
    xc = float(xc) # <-- SEE HERE
    def integrand(x,xi):
        return exp(-(x-xi+xc)**2)/(2.*sqrt(2.*pi))/(1.+exp(x+xc))
    integral = integrate.quad(integrand,-10.,10.,args=(xi,),epsabs=0.)
    return integral[0]

with the new timing:

AT ONCE:
1 loops, best of 3: 321 ms per loop
SEPARATELY:
1 loops, best of 3: 315 ms per loop

这篇关于numpy中向量化函数的性能行为的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆