有没有什么好方法可以优化此python代码的速度? [英] Is there any good way to optimize the speed of this python code?

查看:98
本文介绍了有没有什么好方法可以优化此python代码的速度?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下一段代码,该代码基本上对一些数字表达式求值,并使用它对一定范围的值进行积分.当前的代码在8.6 s内运行,但是我只是在使用模拟值,而我的实际数组要大得多.特别是,我的实际大小freq_c= (3800, 101)number_bin = (3800, 100)的大小,使以下代码的确效率低下,因为实际数组的总执行时间将接近9分钟.代码中相当慢的一部分是对k_one_thirdk_two_third的求值,对于它们我也使用了numexpr.evaluate(".."),这使代码的速度提高了大约10-20%.但是,我在下面避免了numexpr,这样任何人都可以运行它而无需安装软件包.还有其他方法可以提高此代码的速度吗?几个因素的改善也将是足够的.请注意,由于内存问题,for loop几乎是不可避免的,因为数组确实很大,所以我在循环中一次操纵每个轴.我也想知道numba jit优化在这里是否可行.

I have a following piece of code, which basically evaluates some numerical expression, and use it to integrate over certain range of values. The current piece of code runs within about 8.6 s, but I am just using mock values, and my actual array is much larger. Especially, my actual size of freq_c= (3800, 101) and size of number_bin = (3800, 100), which makes the following code really inefficient, as the total execution time will be close to 9 minutes for the actual array. One part of the code that is quite slow is evaluation of k_one_third and k_two_third, for which I have also used numexpr.evaluate(".."), which speeds up the code quite a bit by about 10-20%. But, I have avoided numexpr below, so that anyone can run it without having to install the package. Is there any more ways to improve the speed of this code? An improvement of a few factor would also be good enough. Please note that the for loop is almost unavoidable, due to memory issues, as the arrays are really large, I am manipulating each axis at a time through the loop. I also wonder if numba jit optimisation is possible here.

import numpy as np
import scipy 
from scipy.integrate import simps as simps
import time

def k_one_third(x):
    return (2.*np.exp(-x**2)/x**(1/3) + 4./x**(1/6)*np.exp(-x)/(1+x**(1/3)))**2

def k_two_third(x):
    return (np.exp(-x**2)/x**(2/3) + 2.*x**(5/2)*np.exp(-x)/(6.+x**3))**2

def spectrum(freq_c, number_bin, frequency, gamma, theta):
    theta_gamma_factor = np.einsum('i,j->ij', theta**2, gamma**2)
    theta_gamma_factor += 1.
    t_g_bessel_factor = 1.-1./theta_gamma_factor
    number = np.concatenate((number_bin, np.zeros((number_bin.shape[0], 1), dtype=number_bin.dtype)), axis=1)
    number_theta_gamma = np.einsum('jk, ik->ijk', theta_gamma_factor**2*1./gamma**3, number)
    final = np.zeros((np.size(freq_c[:,0]), np.size(theta), np.size(frequency)))
    for i in xrange(np.size(frequency)):
        b_n_omega_theta_gamma = frequency[i]**2*number_theta_gamma
        eta = theta_gamma_factor**(1.5)*frequency[i]/2.
        eta = np.einsum('jk, ik->ijk', eta, 1./freq_c)
        bessel_eta = np.einsum('jl, ijl->ijl',t_g_bessel_factor, k_one_third(eta))
        bessel_eta += k_two_third(eta)
        eta = None
        integrand = np.multiply(bessel_eta, b_n_omega_theta_gamma, out= bessel_eta)
        final[:,:, i] = simps(integrand, gamma)
        integrand = None
    return final

frequency = np.linspace(1, 100, 100)
theta = np.linspace(1, 3, 100)
gamma = np.linspace(2, 200, 101)
freq_c = np.random.randint(1, 200, size=(50, 101))
number_bin = np.random.randint(1, 100, size=(50, 100))
time1 = time.time()
spectra = spectrum(freq_c, number_bin, frequency, gamma, theta)
print(time.time()-time1)

推荐答案

如注释中所述,应重写大部分代码以获得最佳性能.

As said in the comments large parts of the code should be rewritten to get best performance.

我只修改了Simpson集成,并修改了@HYRY答案.根据您提供的测试数据,这会将计算从 26.15s 加快到 1.76s (15x).通过用简单的循环替换np.einsums,此操作将在不到一秒钟的时间内结束. (改进后的集成大约是0.4s,k_one_two_third(x)是24s)

I have only modified the simpson integration and modified @HYRY answer a bit. This speeds up the calculation from 26.15s to 1.76s (15x), by the test-data you provided. By replacing the np.einsums with simple loops this should end up in less than a second. (About 0.4s from the improved integration, 24s from k_one_two_third(x))

使用Numba以获得性能,阅读.最新的Numba版本(0.39),Intel SVML软件包以及fastmath = True之类的东西对您的示例产生了很大的影响.

For getting performance using Numba read. The latest Numba version (0.39), the Intel SVML-package and things like fastmath=True makes quite a big impact on your example.

代码

#a bit faster than HYRY's version
@nb.njit(parallel=True,fastmath=True,error_model='numpy')
def k_one_two_third(x):
  one=np.empty(x.shape,dtype=x.dtype)
  two=np.empty(x.shape,dtype=x.dtype)
  for i in nb.prange(x.shape[0]):
    for j in range(x.shape[1]):
      for k in range(x.shape[2]):
        x0 = x[i,j,k] ** (1/3)
        x1 = np.exp(-x[i,j,k] ** 2)
        x2 = np.exp(-x[i,j,k])
        one[i,j,k] = (2*x1/x0 + 4*x2/(x[i,j,k]**(1/6)*(x0 + 1)))**2
        two[i,j,k] = (2*x[i,j,k]**(5/2)*x2/(x[i,j,k]**3 + 6) + x1/x[i,j,k]**(2/3))**2
  return one, two

#improved integration
@nb.njit(fastmath=True)
def simpson_nb(y_in,dx):
  s = y[0]+y[-1]

  n=y.shape[0]//2
  for i in range(n-1):
    s += 4.*y[i*2+1]
    s += 2.*y[i*2+2]

  s += 4*y[(n-1)*2+1]
  return(dx/ 3.)*s

@nb.jit(fastmath=True)
def spectrum(freq_c, number_bin, frequency, gamma, theta):
    theta_gamma_factor = np.einsum('i,j->ij', theta**2, gamma**2)
    theta_gamma_factor += 1.
    t_g_bessel_factor = 1.-1./theta_gamma_factor
    number = np.concatenate((number_bin, np.zeros((number_bin.shape[0], 1), dtype=number_bin.dtype)), axis=1)
    number_theta_gamma = np.einsum('jk, ik->ijk', theta_gamma_factor**2*1./gamma**3, number)
    final = np.empty((np.size(frequency),np.size(freq_c[:,0]), np.size(theta)))

    #assume that dx is const. on integration
    #speedimprovement of the scipy.simps is about 4x
    #numba version to scipy.simps(y,x) is about 60x
    dx=gamma[1]-gamma[0]

    for i in range(np.size(frequency)):
        b_n_omega_theta_gamma = frequency[i]**2*number_theta_gamma
        eta = theta_gamma_factor**(1.5)*frequency[i]/2.
        eta = np.einsum('jk, ik->ijk', eta, 1./freq_c)

        one,two=k_one_two_third(eta)

        bessel_eta = np.einsum('jl, ijl->ijl',t_g_bessel_factor, one)
        bessel_eta += two

        integrand = np.multiply(bessel_eta, b_n_omega_theta_gamma, out= bessel_eta)

        #reorder array
        for j in range(integrand.shape[0]):
          for k in range(integrand.shape[1]):
            final[i,j, k] = simpson_nb(integrand[j,k,:],dx)
    return final

这篇关于有没有什么好方法可以优化此python代码的速度?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆