改善python numpy代码的运行时 [英] Improving runtime of python numpy code
问题描述
我有一个代码,可以将垃圾箱重新分配给大型的 numpy
数组.基本上,大型数组的元素以不同的频率进行采样,最终目标是将整个数组重新绑定到固定的bins freq_bins
.对于我拥有的数组,该代码有点慢.有什么好的方法可以改善这段代码的运行时间吗?目前只有少数几个因素可以做到.可能有些 numba
魔术会做.
I have a code which reassigns bins to a large numpy
array. Basically, the elements of the large array has been sampled at different frequency and the final goal is to rebin the entire array at fixed bins freq_bins
. The code is kind of slow for the array I have. Is there any good way to improve the runtime of this code? A factor of few would do for now. May be some numba
magic would do.
import numpy as np
import time
division = 90
freq_division = 50
cd = 3000
boost_factor = np.random.rand(division, division, cd)
freq_bins = np.linspace(1, 60, freq_division)
es = np.random.randint(1,10, size = (cd, freq_division))
final_emit = np.zeros((division, division, freq_division))
time1 = time.time()
for i in xrange(division):
fre_boost = np.einsum('ij, k->ijk', boost_factor[i], freq_bins)
sky_by_cap = np.einsum('ij, jk->ijk', boost_factor[i],es)
freq_index = np.digitize(fre_boost, freq_bins)
freq_index_reshaped = freq_index.reshape(division*cd, -1)
freq_index = None
sky_by_cap_reshaped = sky_by_cap.reshape(freq_index_reshaped.shape)
to_bin_emit = np.zeros(freq_index_reshaped.shape)
row_index = np.arange(freq_index_reshaped.shape[0]).reshape(-1, 1)
np.add.at(to_bin_emit, (row_index, freq_index_reshaped), sky_by_cap_reshaped)
to_bin_emit = to_bin_emit.reshape(fre_boost.shape)
to_bin_emit = np.multiply(to_bin_emit, freq_bins, out=to_bin_emit)
final_emit[i] = np.sum(to_bin_emit, axis=1)
print(time.time()-time1)
推荐答案
保持代码简单而不是优化
如果您有想法要编写哪种算法,请编写一个简单的参考实现.由此,您可以使用Python进行两种方式.您可以尝试矢量化代码或,也可以编译代码以获得良好的性能.
Keep the code simple and than optimize
If you have an idea what algorithm you want to code write a simple reference implementation. From this you can go two ways using Python. You can try to vectorize the code or you can compile the code to get good performance.
即使在Numba中实现了 np.einsum
或 np.add.at
,对于任何编译器来说,都很难从您的示例中生成有效的二进制代码.
Even if np.einsum
or np.add.at
were implementet in Numba, it would be very hard for any compiler to make efficient binary code from your example.
我唯一重写的是一种更有效的数字化标量值的方法.
The only thing I have rewritten is a more efficient approach of digitize for scalar values.
修改
在Numba源代码中,对标量值进行数字化处理的效率更高.
In the Numba source code there is a more efficient implimentation of digitize for scalar values.
代码
#From Numba source
#Copyright (c) 2012, Anaconda, Inc.
#All rights reserved.
@nb.njit(fastmath=True)
def digitize(x, bins, right=False):
# bins are monotonically-increasing
n = len(bins)
lo = 0
hi = n
if right:
if np.isnan(x):
# Find the first nan (i.e. the last from the end of bins,
# since there shouldn't be many of them in practice)
for i in range(n, 0, -1):
if not np.isnan(bins[i - 1]):
return i
return 0
while hi > lo:
mid = (lo + hi) >> 1
if bins[mid] < x:
# mid is too low => narrow to upper bins
lo = mid + 1
else:
# mid is too high, or is a NaN => narrow to lower bins
hi = mid
else:
if np.isnan(x):
# NaNs end up in the last bin
return n
while hi > lo:
mid = (lo + hi) >> 1
if bins[mid] <= x:
# mid is too low => narrow to upper bins
lo = mid + 1
else:
# mid is too high, or is a NaN => narrow to lower bins
hi = mid
return lo
@nb.njit(fastmath=True)
def digitize(value, bins):
if value<bins[0]:
return 0
if value>=bins[bins.shape[0]-1]:
return bins.shape[0]
for l in range(1,bins.shape[0]):
if value>=bins[l-1] and value<bins[l]:
return l
@nb.njit(fastmath=True,parallel=True)
def inner_loop(boost_factor,freq_bins,es):
res=np.zeros((boost_factor.shape[0],freq_bins.shape[0]),dtype=np.float64)
for i in nb.prange(boost_factor.shape[0]):
for j in range(boost_factor.shape[1]):
for k in range(freq_bins.shape[0]):
ind=nb.int64(digitize(boost_factor[i,j]*freq_bins[k],freq_bins))
res[i,ind]+=boost_factor[i,j]*es[j,k]*freq_bins[ind]
return res
@nb.njit(fastmath=True)
def calc_nb(division,freq_division,cd,boost_factor,freq_bins,es):
final_emit = np.empty((division, division, freq_division),np.float64)
for i in range(division):
final_emit[i,:,:]=inner_loop(boost_factor[i],freq_bins,es)
return final_emit
性能
(Quadcore i7)
original_code: 118.5s
calc_nb: 4.14s
#with digitize implementation from Numba source
calc_nb: 2.66s
这篇关于改善python numpy代码的运行时的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!