numpy 的累积点积 [英] Cumulative dot product with numpy
问题描述
我有一个 ndarray A,填充了 N 个 DxD 平方矩阵(形状 (N,D,D)).我想将其转换为相同形状的 ndarray B,其中 B[0]=A[0] 并且对于每个 i>0,B[i] = np.dot(B[i-1], A[i]]).虽然基本实现是显而易见的,但我想知道这个操作是否比 for 循环的实现更快.
I have an ndarray A, populated with N squared DxD matrices (shape (N,D,D)). I want to transform it into an ndarray B of the same shape, where B[0]=A[0] and for every i>0, B[i] = np.dot(B[i-1], A[i]). While a basic implementation is obvious, I wondered whether this operation has a faster implementation than a for loop.
例如,描述另一种执行计算的方法:
Let me, For example, describe another way to perform the calculation:
- B[0...N/2] = 计算 A[0]...A[N/2 - 1] 的基本方式
- B[N/2...N] = 以基本方式计算 A[N/2]...A[N]
- 返回 np.concatenate((B[0...N/2 - 1], np.dot(B[N/2 - 1], B[N/2...N])]
重点是 1 和 2 可以并行完成,而 3 是矢量化操作 - 并且可以根据需要进一步将这种拆分应用于阵列上的每一半.这让我想知道是否存在比基本 for 循环更好的选择(例如,我的建议是否已实现/是否是实际改进,或者是否更可取其他选项).
The emphasis is that 1 and 2 can be done in parallel and 3 is a vectorized operation - and that this split can be further applied for each half on the array as needed. This makes me wonder if a better option than the basic for loop exists (e.g whether what I'm suggesting is implemented/is an actual improvement, or whether another option is preferrable).
非常感谢,
伊夫塔赫
用于基准测试的最基本实现代码:
code for most basic implementation, for benchmarking:
import numpy as np
def cumdot(A):
B = np.empty(A.shape)
B[0] = A[0]
for i in range(1, A.shape[0]):
B[i] = B[i - 1] @ A[i]
return B
Edit2: 似乎在 numpy 中,所有 ufunc 都支持 .accumulate()(这正是我想要做的)和 matmul(其行为类似于点积),是一个通用 ufunc.这意味着 matmul 不是一个从两个标量到一个的函数,而是从两个矩阵到一个矩阵,因此当函数 accum 存在时,调用它会引发一个异常,说明在具有签名的 ufunc 上不能调用cumulative.如果这可以在签名的情况下发挥作用,我也很想知道.
It seems like in numpy, all ufuncs support a .accumulate() (which is exactly what I'm trying to do), and matmul (which behaves like a dot product), is a generalized ufunc. That means matmul is not a function from two scalars to one, but from two matrices to a matrix, and therefore while the function accumulate exist, calling it will raise an exception stating that accumulate is not callable on ufuncs that have a signature. If this can be made to work despite the signature thing, I'd also love to know.
推荐答案
我认为没有一种完全矢量化的方法可以仅使用 numpy 函数来做到这一点(但我很高兴被证明是错误的!).
I don't think there is a fully vectorized way to do this with just numpy functions (but I'd be happy to be proven wrong!).
您可以通过使用itertools.accumulate
来隐藏循环以生成累积产品.这是一个例子.要创建 A
,我将使用 N
随机正交矩阵,使用 scipy.stats.ortho_group
生成,以确保产品保持有界.
You can hide the loop by using itertools.accumulate
to generate the cumulative products. Here's an example. To create A
, I'll use N
random orthogonal matrices, generated using scipy.stats.ortho_group
, to ensure that the products remain bounded.
这里的前几行创建了 A
,形状为 (1000, 4, 4).
The first few lines here create A
, with shape (1000, 4, 4).
In [101]: from scipy.stats import ortho_group
In [102]: N = 1000
In [103]: D = 4
In [104]: A = ortho_group.rvs(D, size=N)
用itertools.accumulate
计算累积乘积,并将结果放入一个numpy数组中.
Compute the cumulative products with itertools.accumulate
, and put the result in a numpy array.
In [105]: from itertools import accumulate
In [106]: B = np.array(list(accumulate(A, np.matmul)))
验证我们从 cumdot(A)
得到相同的结果.
Verify that we get the same result from cumdot(A)
.
In [107]: def cumdot(A):
...: B = np.empty(A.shape)
...: B[0] = A[0]
...: for i in range(1, A.shape[0]):
...: B[i] = B[i - 1] @ A[i]
...: return B
...:
In [108]: B0 = cumdot(A)
In [109]: (B == B0).all()
Out[109]: True
检查性能.事实证明,使用 itertools.accumulate
稍微快一些.
Check the performance. It turns out the using itertools.accumulate
is slightly faster.
In [110]: %timeit B0 = cumdot(A)
2.89 ms ± 31.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [111]: %timeit B = np.array(list(accumulate(A, np.matmul)))
2.44 ms ± 33.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
这篇关于numpy 的累积点积的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!