使用 Numpy 进行外减法 [英] Outer subtraction with Numpy
问题描述
我只想做:C_i=\Sum_k (A_i -B_k)^2我发现使用简单的 for 循环
比使用 numpy.subtract.outer
计算更快!无论如何,我觉得 numpy.einsum
将是最快的.我不能很好地理解 numpy.einsum
.任何人都可以帮我吗?此外,如果有人解释如何使用 numpy.einsum
编写由向量/矩阵组成的通用求和表达式,那就太好了?
I simply want to do: C_i=\Sum_k (A_i -B_k)^2
I saw that this calculation is faster with a simple for loop
than with the numpy.subtract.outer
! Anyway I feel that numpy.einsum
would be the fastest. I could not understand numpy.einsum
that well. Can anyone please help me out? Additionally, it would be great if someone explains how a general summation expression consisting of vector/matrices can be written with numpy.einsum
?
我没有在网上找到这个特定问题的解决方案.抱歉,如果以某种方式重复.
I did not find solution for this particular problem on the web. Sorry if duplicate in some manner.
MWE with loop and numpy.subtract.outer
--
MWE with loop and numpy.subtract.outer
--
A)带循环
import timeit
code1="""
import numpy as np
N=10000;
a=np.random.rand(N); b=10*(np.random.rand(N)-0.5);
def A1(x,y):
Nx=len(x)
z=np.zeros(Nx)
for i in np.arange(Nx):
z[i]=np.sum((x[i]-y)*(x[i]-y))
return z
C1=A1(a,b)"""
elapsed_time = timeit.timeit(code1, number=10)/10
print "time=", elapsed_time
B) 使用 numpy.subtract.outer
import timeit
code1="""
import numpy as np
N=10000;
a=np.random.rand(N); b=10*(np.random.rand(N)-0.5);
def A2(x,y):
C=np.subtract.outer(x,y);
return np.sum(C*C, axis=1)
C2=A2(a,b)"""
elapsed_time = timeit.timeit(code1, number=10)/10
print "time=", elapsed_time
对于 N=10000,循环变得更快.对于 N=100,外部减法变得更快.对于 N=10^5,外部减法在我的 8GB 内存桌面上面临内存问题!
For N=10000 the loop becomes faster. For N=100, the outer subtract becomes faster. For N=10^5, outer subtract faces memory issue on my desktop with 8GB ram!
推荐答案
至少使用 Numba 或 Fortran 实现
你的两个函数都很慢.Python 循环非常低效 (A1),并且分配大型临时数组也很慢(A2 和部分 A1).
Use at least Numba, or a Fortran Implementation
Both of your functions are quite slow. Python loops are very inefficient (A1), and allocating large temporary arrays is also slow (A2 and partially also A1).
小数组的朴素 Numba 实现
import numba as nb
import numpy as np
@nb.njit(parallel=True, fastmath=True)
def A_nb_p(x,y):
z=np.empty(x.shape[0])
for i in nb.prange(x.shape[0]):
TMP=0.
for j in range(y.shape[0]):
TMP+=(x[i]-y[j])**2
z[i]=TMP
return z
时间
import time
N=int(1e5)
a=np.random.rand(N)
b=10*(np.random.rand(N)-0.5)
t1=time.time()
res_1=A1(a,b)
print(time.time()-t1)
#95.30195426940918 s
t1=time.time()
res_2=A_nb_p(a,b)
print(time.time()-t1)
#0.28573083877563477 s
#A2 is too slow to measure
如上所述,这是在较大数组上的幼稚实现,因为 Numba 无法按块进行计算,这会导致大量缓存未命中,从而导致性能不佳.一些 Fortran 或 C 编译器至少应该能够自动进行以下优化(逐块评估).
As written above this is a naive implementation on larger arrays, since Numba isn't able to do the calculation blockwise, which leads to a lot of cache misses and therefore bad performance. Some Fortran or C- compiler should be able to do at least the following optimization (block-wise evaluation) automatically.
大型数组的实现
@nb.njit(parallel=True, fastmath=True)
def A_nb_p_2(x,y):
blk_s=1024
z=np.zeros(x.shape[0])
num_blk_x=x.shape[0]//blk_s
num_blk_y=y.shape[0]//blk_s
for ii in nb.prange(num_blk_x):
for jj in range(num_blk_y):
for i in range(blk_s):
TMP=z[ii*blk_s+i]
for j in range(blk_s):
TMP+=(x[ii*blk_s+i]-y[jj*blk_s+j])**2
z[ii*blk_s+i]=TMP
for i in nb.prange(x.shape[0]):
TMP=z[i]
for j in range(num_blk_y*blk_s,y.shape[0]):
TMP+=(x[i]-y[j])**2
z[i]=TMP
for i in nb.prange(num_blk_x*blk_s,x.shape[0]):
TMP=z[i]
for j in range(num_blk_y*blk_s):
TMP+=(x[i]-y[j])**2
z[i]=TMP
return z
时间
N=int(2*1e6)
a=np.random.rand(N)
b=10*(np.random.rand(N)-0.5)
t1=time.time()
res_1=A_nb_p(a,b)
print(time.time()-t1)
#298.9394392967224
t1=time.time()
res_2=A_nb_p_2(a,b)
print(time.time()-t1)
#70.12
这篇关于使用 Numpy 进行外减法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!