改善python numpy代码的运行时 [英] Improving runtime of python numpy code

查看:93
本文介绍了改善python numpy代码的运行时的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个代码,可以将垃圾箱重新分配给大型的 numpy 数组.基本上,大型数组的元素以不同的频率进行采样,最终目标是将整个数组重新绑定到固定的bins freq_bins .对于我拥有的数组,该代码有点慢.有什么好的方法可以改善这段代码的运行时间吗?目前只有少数几个因素可以做到.可能有些 numba 魔术会做.

I have a code which reassigns bins to a large numpy array. Basically, the elements of the large array has been sampled at different frequency and the final goal is to rebin the entire array at fixed bins freq_bins. The code is kind of slow for the array I have. Is there any good way to improve the runtime of this code? A factor of few would do for now. May be some numba magic would do.

import numpy as np
import time
division = 90
freq_division = 50
cd = 3000
boost_factor = np.random.rand(division, division, cd)
freq_bins = np.linspace(1, 60, freq_division)
es = np.random.randint(1,10, size = (cd, freq_division))
final_emit = np.zeros((division, division, freq_division))
time1 = time.time()
for i in xrange(division):
    fre_boost = np.einsum('ij, k->ijk', boost_factor[i], freq_bins)
    sky_by_cap = np.einsum('ij, jk->ijk', boost_factor[i],es)
    freq_index = np.digitize(fre_boost, freq_bins)
    freq_index_reshaped = freq_index.reshape(division*cd, -1)
    freq_index = None
    sky_by_cap_reshaped = sky_by_cap.reshape(freq_index_reshaped.shape)
    to_bin_emit = np.zeros(freq_index_reshaped.shape)
    row_index = np.arange(freq_index_reshaped.shape[0]).reshape(-1, 1)
    np.add.at(to_bin_emit, (row_index, freq_index_reshaped), sky_by_cap_reshaped)
    to_bin_emit = to_bin_emit.reshape(fre_boost.shape)
    to_bin_emit = np.multiply(to_bin_emit, freq_bins, out=to_bin_emit)
    final_emit[i] = np.sum(to_bin_emit, axis=1)
print(time.time()-time1)

推荐答案

保持代码简单而不是优化

如果您有想法要编写哪种算法,请编写一个简单的参考实现.由此,您可以使用Python进行两种方式.您可以尝试矢量化代码,也可以编译代码以获得良好的性能.

Keep the code simple and than optimize

If you have an idea what algorithm you want to code write a simple reference implementation. From this you can go two ways using Python. You can try to vectorize the code or you can compile the code to get good performance.

即使在Numba中实现了 np.einsum np.add.at ,对于任何编译器来说,都很难从您的示例中生成有效的二进制代码.

Even if np.einsum or np.add.at were implementet in Numba, it would be very hard for any compiler to make efficient binary code from your example.

我唯一重写的是一种更有效的数字化标量值的方法.

The only thing I have rewritten is a more efficient approach of digitize for scalar values.

修改

在Numba源代码中,对标量值进行数字化处理的效率更高.

In the Numba source code there is a more efficient implimentation of digitize for scalar values.

代码

#From Numba source
#Copyright (c) 2012, Anaconda, Inc.
#All rights reserved.

@nb.njit(fastmath=True)
def digitize(x, bins, right=False):
    # bins are monotonically-increasing
    n = len(bins)
    lo = 0
    hi = n

    if right:
        if np.isnan(x):
            # Find the first nan (i.e. the last from the end of bins,
            # since there shouldn't be many of them in practice)
            for i in range(n, 0, -1):
                if not np.isnan(bins[i - 1]):
                    return i
            return 0
        while hi > lo:
            mid = (lo + hi) >> 1
            if bins[mid] < x:
                # mid is too low => narrow to upper bins
                lo = mid + 1
            else:
                # mid is too high, or is a NaN => narrow to lower bins
                hi = mid
    else:
        if np.isnan(x):
            # NaNs end up in the last bin
            return n
        while hi > lo:
            mid = (lo + hi) >> 1
            if bins[mid] <= x:
                # mid is too low => narrow to upper bins
                lo = mid + 1
            else:
                # mid is too high, or is a NaN => narrow to lower bins
                hi = mid

    return lo

@nb.njit(fastmath=True)
def digitize(value, bins):
  if value<bins[0]:
    return 0

  if value>=bins[bins.shape[0]-1]:
    return bins.shape[0]

  for l in range(1,bins.shape[0]):
    if value>=bins[l-1] and value<bins[l]:
      return l

@nb.njit(fastmath=True,parallel=True)
def inner_loop(boost_factor,freq_bins,es):
  res=np.zeros((boost_factor.shape[0],freq_bins.shape[0]),dtype=np.float64)
  for i in nb.prange(boost_factor.shape[0]):
    for j in range(boost_factor.shape[1]):
      for k in range(freq_bins.shape[0]):
        ind=nb.int64(digitize(boost_factor[i,j]*freq_bins[k],freq_bins))
        res[i,ind]+=boost_factor[i,j]*es[j,k]*freq_bins[ind]
  return res

@nb.njit(fastmath=True)
def calc_nb(division,freq_division,cd,boost_factor,freq_bins,es):
  final_emit = np.empty((division, division, freq_division),np.float64)
  for i in range(division):
    final_emit[i,:,:]=inner_loop(boost_factor[i],freq_bins,es)
  return final_emit

性能

(Quadcore i7)
original_code: 118.5s
calc_nb: 4.14s
#with digitize implementation from Numba source
calc_nb: 2.66s

这篇关于改善python numpy代码的运行时的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆