是否期望numba的有效平方欧几里德距离代码比numpy的有效欧几里得距离编码慢? [英] Is it expected for numba's efficient square euclidean distance code to be slower than numpy's efficient counterpart?

查看:91
本文介绍了是否期望numba的有效平方欧几里德距离代码比numpy的有效欧几里得距离编码慢?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我从(为什么这个numba代码比numpy代码慢6倍?),以便它可以处理x1为(n,m)

I modify the most efficient code from (Why this numba code is 6x slower than numpy code?) so that it can handle x1 being (n, m)

@nb.njit(fastmath=True,parallel=True)
def euclidean_distance_square_numba_v5(x1, x2):
    res = np.empty((x1.shape[0], x2.shape[0]), dtype=x2.dtype)
    for a_idx in nb.prange(x1.shape[0]):
        for o_idx in range(x2.shape[0]):
            val = 0.
            for i_idx in range(x2.shape[1]):
                tmp = x1[a_idx, i_idx] - x2[o_idx, i_idx]
                val += tmp * tmp 
            res[a_idx, o_idx] = val 
    return res

但是,效率更高的numpy版本仍然没有效率:

However, it is still not more efficient that the more efficient numpy's version:

def euclidean_distance_square_einsum(x1, x2):
    return np.einsum('ij,ij->i', x1, x1)[:, np.newaxis] + np.einsum('ij,ij->i', x2, x2) - 2*np.dot(x1, x2.T)

输入为

a = np.zeros((1000000,512), dtype=np.float32)
b = np.zeros((100, 512), dtype=np.float32)

我得到的时间是numba代码为2.4723422527313232和numpy代码为0.8260958194732666.

The timing I got is 2.4723422527313232 for the numba code and 0.8260958194732666 for the numpy code.

推荐答案

是的,这是预期的.

您必须意识到的第一件事:点积是numpy版本的主力军,这里是稍小的数组:

The first thing you must be aware of: dot-product is the working horse of the numpy-version, here for slightly smaller arrays:

>>> def only_dot(x1, x2):
        return - 2*np.dot(x1, x2.T)

>>> a = np.zeros((1000,512), dtype=np.float32)
>>> b = np.zeros((100, 512), dtype=np.float32)

>>> %timeit(euclidean_distance_square_einsum(a,b))
6.08 ms ± 312 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
>>> %timeit(euclidean_only_dot(a,b))
5.25 ms ± 330 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

即花费了85%的时间.

i.e. 85% of the time are spent in it.

当您查看numba代码时,它看起来像是矩阵矩阵乘法的某种奇怪/不寻常/更复杂的版本-例如,可以看到相同的三个循环.

When you look at your numba-code, that looks like a somewhat strange/unusual/more complicated version of matrix-matrix-multiplication - one could see for example the same three loops.

因此,基本上,您正在尝试击败最好的最佳算法之一.例如,这是有人尝试这样做并失败.我的安装使用Intel的MKL版本,该版本必须比默认实现更为复杂,默认实现可以在

So basically, you are trying to beat one of the best optimized algorithms out there. Here is for example somebody trying to do it and failing. My installation uses Intel's MKL-version, which must be more sophisticated than the default-implementation, which can be found here.

有时候,在享受了整个乐趣之后,人们不得不承认自己的重新发明的轮子"不如最新的轮子好……但是只有这样,人们才能真正欣赏它的性能.

Sometimes, after the whole fun one had, one has to admit that the own "reinvented wheel" is not as good as the state-of-the-art wheel... but only then one can truly appreciate its performance.

这篇关于是否期望numba的有效平方欧几里德距离代码比numpy的有效欧几里得距离编码慢?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆