numpy 的累积点积 [英] Cumulative dot product with numpy

查看:82
本文介绍了numpy 的累积点积的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 ndarray A,填充了 N 个 DxD 平方矩阵(形状 (N,D,D)).我想将其转换为相同形状的 ndarray B,其中 B[0]=A[0] 并且对于每个 i>0,B[i] = np.dot(B[i-1], A[i]]).虽然基本实现是显而易见的,但我想知道这个操作是否比 for 循环的实现更快.

I have an ndarray A, populated with N squared DxD matrices (shape (N,D,D)). I want to transform it into an ndarray B of the same shape, where B[0]=A[0] and for every i>0, B[i] = np.dot(B[i-1], A[i]). While a basic implementation is obvious, I wondered whether this operation has a faster implementation than a for loop.

例如,描述另一种执行计算的方法:

Let me, For example, describe another way to perform the calculation:

  1. B[0...N/2] = 计算 A[0]...A[N/2 - 1] 的基本方式
  2. B[N/2...N] = 以基本方式计算 A[N/2]...A[N]
  3. 返回 np.concatenate((B[0...N/2 - 1], np.dot(B[N/2 - 1], B[N/2...N])]

重点是 1 和 2 可以并行完成,而 3 是矢量化操作 - 并且可以根据需要进一步将这种拆分应用于阵列上的每一半.这让我想知道是否存在比基本 for 循环更好的选择(例如,我的建议是否已实现/是否是实际改进,或者是否更可取其他选项).

The emphasis is that 1 and 2 can be done in parallel and 3 is a vectorized operation - and that this split can be further applied for each half on the array as needed. This makes me wonder if a better option than the basic for loop exists (e.g whether what I'm suggesting is implemented/is an actual improvement, or whether another option is preferrable).

非常感谢,

伊夫塔赫

用于基准测试的最基本实现代码:

code for most basic implementation, for benchmarking:

import numpy as np

def cumdot(A):
    B = np.empty(A.shape)
    B[0] = A[0]
    for i in range(1, A.shape[0]):  
        B[i] = B[i - 1] @ A[i]
    return B

Edit2: 似乎在 numpy 中,所有 ufunc 都支持 .accumulate()(这正是我想要做的)和 matmul(其行为类似于点积),是一个通用 ufunc.这意味着 matmul 不是一个从两个标量到一个的函数,而是从两个矩阵到一个矩阵,因此当函数 accum 存在时,调用它会引发一个异常,说明在具有签名的 ufunc 上不能调用cumulative.如果这可以在签名的情况下发挥作用,我也很想知道.

It seems like in numpy, all ufuncs support a .accumulate() (which is exactly what I'm trying to do), and matmul (which behaves like a dot product), is a generalized ufunc. That means matmul is not a function from two scalars to one, but from two matrices to a matrix, and therefore while the function accumulate exist, calling it will raise an exception stating that accumulate is not callable on ufuncs that have a signature. If this can be made to work despite the signature thing, I'd also love to know.

推荐答案

我认为没有一种完全矢量化的方法可以仅使用 numpy 函数来做到这一点(但我很高兴被证明是错误的!).

I don't think there is a fully vectorized way to do this with just numpy functions (but I'd be happy to be proven wrong!).

您可以通过使用itertools.accumulate 来隐藏循环以生成累积产品.这是一个例子.要创建 A,我将使用 N 随机正交矩阵,使用 scipy.stats.ortho_group 生成,以确保产品保持有界.

You can hide the loop by using itertools.accumulate to generate the cumulative products. Here's an example. To create A, I'll use N random orthogonal matrices, generated using scipy.stats.ortho_group, to ensure that the products remain bounded.

这里的前几行创建了 A,形状为 (1000, 4, 4).

The first few lines here create A, with shape (1000, 4, 4).

In [101]: from scipy.stats import ortho_group

In [102]: N = 1000

In [103]: D = 4

In [104]: A = ortho_group.rvs(D, size=N)

itertools.accumulate计算累积乘积,并将结果放入一个numpy数组中.

Compute the cumulative products with itertools.accumulate, and put the result in a numpy array.

In [105]: from itertools import accumulate

In [106]: B = np.array(list(accumulate(A, np.matmul)))   

验证我们从 cumdot(A) 得到相同的结果.

Verify that we get the same result from cumdot(A).

In [107]: def cumdot(A): 
     ...:     B = np.empty(A.shape) 
     ...:     B[0] = A[0] 
     ...:     for i in range(1, A.shape[0]):   
     ...:         B[i] = B[i - 1] @ A[i] 
     ...:     return B 
     ...:                                                                                         

In [108]: B0 = cumdot(A)                                                                          

In [109]: (B == B0).all()                                                                         
Out[109]: True

检查性能.事实证明,使用 itertools.accumulate 稍微快一些.

Check the performance. It turns out the using itertools.accumulate is slightly faster.

In [110]: %timeit B0 = cumdot(A)
2.89 ms ± 31.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [111]: %timeit B = np.array(list(accumulate(A, np.matmul)))
2.44 ms ± 33.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

这篇关于numpy 的累积点积的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆