找到大向量矩阵的点积的最快方法 [英] Fastest Way to Find the Dot Product of a Large Matrix of Vectors

查看:137
本文介绍了找到大向量矩阵的点积的最快方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找有关解决以下问题的最有效方法的建议:

I am looking for suggestions on the most efficient way to solve the following problem:

我有两个数组,分别称为A和B.它们的形状均为NxNx3.它们代表两个二维位置矩阵,每个位置都是x,y和z坐标的向量.

I have two arrays called A and B. They are both of shape NxNx3. They represent two 2D matrix of positions, where each position is a vector of x, y, and z coordinates.

我想创建一个形状为NxN的名为C的新数组,其中C [i,j]是向量A [i,j]和B [i,j]的点积.

I want to create a new array, called C, of shape NxN, where C[i, j] is the dot product of the vectors A[i, j] and B[i, j].

这是到目前为止我提出的解决方案.第一个使用numpy的einsum函数(在此处进行了详细描述).第二种方法使用numpy的广播规则及其sum函数.

Here are the solutions I've come up with so far. The first uses the numpy's einsum function (which is beautifully described here). The second uses numpy's broadcasting rules along with its sum function.

>>> import numpy as np
>>> A = np.random.randint(0, 10, (100, 100, 3))
>>> B = np.random.randint(0, 10, (100, 100, 3))
>>> C = np.einsum("ijk,ijk->ij", A, B)
>>> D = np.sum(A * B, axis=2)
>>> np.allclose(C, D)
True

有更快的方法吗?我曾听到一些杂语,说numpy的tensordot函数可以快速发展,但我一直都很难理解它.使用numpy的点或内部函数呢?

Is there a faster way? I've heard murmurs that numpy's tensordot function can be blazing fast but I've always struggled to understand it. What about using numpy's dot, or inner functions?

在某些情况下,A和B数组通常具有100至1000个元素.

For some context, the A and B arrays will typically have between 100 and 1000 elements.

非常感谢任何指导!

推荐答案

经过一些调整,我们可以使用matmul.想法是将前2个维视为批处理"维,将最后2个维视为dot:

With a bit of reshaping, we can use matmul. The idea is to treat the first 2 dimensions as the 'batch' dimensions, and to the dot on the last:

In [278]: E = A[...,None,:]@B[...,:,None]                                       
In [279]: E.shape                                                               
Out[279]: (100, 100, 1, 1)
In [280]: E = np.squeeze(A[...,None,:]@B[...,:,None])                           
In [281]: np.allclose(C,E)                                                      
Out[281]: True
In [282]: timeit E = np.squeeze(A[...,None,:]@B[...,:,None])                    
130 µs ± 2.01 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [283]: timeit C = np.einsum("ijk,ijk->ij", A, B)                             
90.2 µs ± 1.53 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

比较时间可能会有些棘手.在当前版本中,einsum可以根据尺寸采用不同的路线.在某些情况下,似乎将任务委托给matmul(或至少使用相同的底层BLAS类似代码).虽然einsum在此测试中更快是很好的,但我不会对此进行概括.

Comparative timings can be a bit tricky. In the current versions, einsum can take different routes depending on the dimensions. In some cases it appears to delegate the task to matmul (or at least the same underlying BLAS-like code). While it's nice that einsum is faster in this test, I wouldn't generalize that.

tensordot只是对数组进行整形(如果需要转置),因此可以应用普通的2d np.dot.实际上,它在这里不起作用,因为您将前两个轴视为批处理",因为在这两个轴上执行的是outer product.

tensordot just reshapes (and if needed transposes) the arrays so it can apply the ordinary 2d np.dot. Actually it doesn't work here because you are treating the first 2 axes as a 'batch', where as it does an outer product on them.

这篇关于找到大向量矩阵的点积的最快方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆