使用 Numpy 进行外减法 [英] Outer subtraction with Numpy

查看:33
本文介绍了使用 Numpy 进行外减法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我只想做:C_i=\Sum_k (A_i -B_k)^2我发现使用简单的 for 循环 比使用 numpy.subtract.outer 计算更快!无论如何,我觉得 numpy.einsum 将是最快的.我不能很好地理解 numpy.einsum.任何人都可以帮我吗?此外,如果有人解释如何使用 numpy.einsum 编写由向量/矩阵组成的通用求和表达式,那就太好了?

I simply want to do: C_i=\Sum_k (A_i -B_k)^2 I saw that this calculation is faster with a simple for loop than with the numpy.subtract.outer! Anyway I feel that numpy.einsum would be the fastest. I could not understand numpy.einsum that well. Can anyone please help me out? Additionally, it would be great if someone explains how a general summation expression consisting of vector/matrices can be written with numpy.einsum?

我没有在网上找到这个特定问题的解决方案.抱歉,如果以某种方式重复.

I did not find solution for this particular problem on the web. Sorry if duplicate in some manner.

MWE with loop and numpy.subtract.outer--

MWE with loop and numpy.subtract.outer--

A)带循环

import timeit
code1="""
import numpy as np

N=10000;

a=np.random.rand(N); b=10*(np.random.rand(N)-0.5);

def A1(x,y):
    Nx=len(x)
    z=np.zeros(Nx)
    for i in np.arange(Nx):
        z[i]=np.sum((x[i]-y)*(x[i]-y))

    return z

C1=A1(a,b)"""
elapsed_time = timeit.timeit(code1, number=10)/10
print "time=", elapsed_time

B) 使用 numpy.subtract.outer

import timeit
code1="""
import numpy as np

N=10000;

a=np.random.rand(N); b=10*(np.random.rand(N)-0.5);

def A2(x,y):
    C=np.subtract.outer(x,y);
    return np.sum(C*C, axis=1)

C2=A2(a,b)"""
elapsed_time = timeit.timeit(code1, number=10)/10
print "time=", elapsed_time

对于 N=10000,循环变得更快.对于 N=100,外部减法变得更快.对于 N=10^5,外部减法在我的 8GB 内存桌面上面临内存问题!

For N=10000 the loop becomes faster. For N=100, the outer subtract becomes faster. For N=10^5, outer subtract faces memory issue on my desktop with 8GB ram!

推荐答案

至少使用 Numba 或 Fortran 实现

你的两个函数都很慢.Python 循环非常低效 (A1),并且分配大型临时数组也很慢(A2 和部分 A1).

Use at least Numba, or a Fortran Implementation

Both of your functions are quite slow. Python loops are very inefficient (A1), and allocating large temporary arrays is also slow (A2 and partially also A1).

小数组的朴素 Numba 实现

import numba as nb
import numpy as np

@nb.njit(parallel=True, fastmath=True)
def A_nb_p(x,y):

    z=np.empty(x.shape[0])
    for i in nb.prange(x.shape[0]):
        TMP=0.
        for j in range(y.shape[0]):
            TMP+=(x[i]-y[j])**2
        z[i]=TMP

    return z

时间

import time
N=int(1e5)
a=np.random.rand(N)
b=10*(np.random.rand(N)-0.5)

t1=time.time()
res_1=A1(a,b)
print(time.time()-t1)
#95.30195426940918 s

t1=time.time()
res_2=A_nb_p(a,b)
print(time.time()-t1)
#0.28573083877563477 s

#A2 is too slow to measure

如上所述,这是在较大数组上的幼稚实现,因为 Numba 无法按块进行计算,这会导致大量缓存未命中,从而导致性能不佳.一些 Fortran 或 C 编译器至少应该能够自动进行以下优化(逐块评估).

As written above this is a naive implementation on larger arrays, since Numba isn't able to do the calculation blockwise, which leads to a lot of cache misses and therefore bad performance. Some Fortran or C- compiler should be able to do at least the following optimization (block-wise evaluation) automatically.

大型数组的实现

@nb.njit(parallel=True, fastmath=True)
def A_nb_p_2(x,y):
    blk_s=1024
    z=np.zeros(x.shape[0])
    num_blk_x=x.shape[0]//blk_s
    num_blk_y=y.shape[0]//blk_s

    for ii in nb.prange(num_blk_x):
        for jj in range(num_blk_y):
            for i in range(blk_s):
                TMP=z[ii*blk_s+i]
                for j in range(blk_s):
                    TMP+=(x[ii*blk_s+i]-y[jj*blk_s+j])**2
                z[ii*blk_s+i]=TMP

    for i in nb.prange(x.shape[0]):
        TMP=z[i]
        for j in range(num_blk_y*blk_s,y.shape[0]):
            TMP+=(x[i]-y[j])**2
        z[i]=TMP

    for i in nb.prange(num_blk_x*blk_s,x.shape[0]):
        TMP=z[i]
        for j in range(num_blk_y*blk_s):
            TMP+=(x[i]-y[j])**2
        z[i]=TMP

    return z

时间

N=int(2*1e6)
a=np.random.rand(N)
b=10*(np.random.rand(N)-0.5)

t1=time.time()
res_1=A_nb_p(a,b)
print(time.time()-t1)
#298.9394392967224

t1=time.time()
res_2=A_nb_p_2(a,b)
print(time.time()-t1)
#70.12

这篇关于使用 Numpy 进行外减法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆