矩阵/张量三乘积? [英] Matrix/Tensor Triple Product?

查看:66
本文介绍了矩阵/张量三乘积?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在研究的算法需要在几个地方计算一种矩阵三乘积.

An algorithm I'm working on requires computing, in a couple places, a type of matrix triple product.

该操作采用三个具有相同尺寸的正方形矩阵,并生成一个3索引张量.标记操作数ABC,结果的第(i,j,k)个元素是

The operation takes three square matrices with identical dimensions, and produces a 3-index tensor. Labeling the operands A, B and C, the (i,j,k)-th element of the result is

X[i,j,k] = \sum_a A[i,a] B[a,j] C[k,a]

在numpy中,您可以使用einsum('ia,aj,ka->ijk', A, B, C)进行计算.

In numpy, you can compute this with einsum('ia,aj,ka->ijk', A, B, C).

问题:

  • 此操作是否有标准名称?
  • 我可以通过一个BLAS调用来计算吗?
  • 是否还有其他可以对这种类型的表达式进行计算的,经过大量优化的数字C/Fortran库?

推荐答案

简介和解决方案代码

np.einsum 确实很难打败,在极少数情况下,如果可以将matrix-multiplication引入计算中,您仍然可以打败它.经过几次试验,看来您可以加入 matrix-multiplication with np.dot 超越了np.einsum('ia,aj,ka->ijk', A, B, C)的性能.

Introduction and Solution Code

np.einsum, is really hard to beat, but in rare cases, you can still beat it, if you can bring in matrix-multiplication into the computations. After few trials, it seems you can bring in matrix-multiplication with np.dot to surpass the performance with np.einsum('ia,aj,ka->ijk', A, B, C).

基本思想是,我们将所有einsum"操作分解为np.einsumnp.dot的组合,如下所示:

The basic idea is that we break down the "all einsum" operation into a combination of np.einsum and np.dot as listed below:

  • A:[i,a]B:[a,j]的总和是用np.einsum完成的,我们得到了3D array:[i,j,a].
  • 然后将这个3D数组重塑为2D array:[i*j,a],并将第三个数组C[k,a]转置为[a,k],目的是在这两个数组之间执行matrix-multiplication,使我们将[i*j,k]作为矩阵乘积,因为我们在那里丢失了索引[a].
  • 将产品重塑为3D array:[i,j,k]以获得最终输出.
  • The summations for A:[i,a] and B:[a,j] are done with np.einsum to get us a 3D array:[i,j,a].
  • This 3D array is then reshaped into a 2D array:[i*j,a] and the third array, C[k,a] is transposed to [a,k], with the intention of performing matrix-multiplication between these two, giving us [i*j,k] as the matrix product, as we lose the index [a] there.
  • The product is reshaped into a 3D array:[i,j,k] for the final output.

这是到目前为止讨论的第一个版本的实现-

Here's the implementation for the first version discussed so far -

import numpy as np

def tensor_prod_v1(A,B,C):   # First version of proposed method
    # Shape parameters
    m,d = A.shape
    n = B.shape[1]
    p = C.shape[0]

    # Calculate \sum_a A[i,a] B[a,j] to get a 3D array with indices as (i,j,a)
    AB = np.einsum('ia,aj->ija', A, B)

    # Calculate entire summation losing a-ith index & reshaping to desired shape
    return np.dot(AB.reshape(m*n,d),C.T).reshape(m,n,p)

由于我们要对所有三个输入数组的a-th索引求和,因此可以使用三种不同的方法沿第a个索引求和.前面列出的代码是(A,B)的代码.因此,我们还可以使用(A,C)(B,C)给我们另外两个变化,如下所示:

Since we are summing the a-th index across all three input arrays, one can have three different methods to sum along the a-th index. The code listed earlier was for (A,B). Thus, we can also have (A,C) and (B,C) giving us two more variations, as listed next:

def tensor_prod_v2(A,B,C):
    # Shape parameters
    m,d = A.shape
    n = B.shape[1]
    p = C.shape[0]

    # Calculate \sum_a A[i,a] C[k,a] to get a 3D array with indices as (i,k,a)
    AC = np.einsum('ia,ja->ija', A, C)

    # Calculate entire summation losing a-ith index & reshaping to desired shape
    return np.dot(AC.reshape(m*p,d),B).reshape(m,p,n).transpose(0,2,1)

def tensor_prod_v3(A,B,C):
    # Shape parameters
    m,d = A.shape
    n = B.shape[1]
    p = C.shape[0]

    # Calculate \sum_a B[a,j] C[k,a] to get a 3D array with indices as (a,j,k)
    BC = np.einsum('ai,ja->aij', B, C)

    # Calculate entire summation losing a-ith index & reshaping to desired shape
    return np.dot(A,BC.reshape(d,n*p)).reshape(m,n,p)

根据输入数组的形状,不同的方法相对于彼此会产生不同的加速,但是我们希望所有方法都会比all-einsum方法更好.性能编号在下一部分中列出.

Depending upon the shapes of the input arrays, different approaches would yield different speedups with respect to each other, but we are hopeful that all would be better than the all-einsum approach. The performance numbers are listed in the next section.

这可能是最重要的部分,因为我们尝试使用建议的方法的三种变体来研究加速数 问题中最初提出的all-einsum方法.

This is probably the most important section, as we try to look into the speedup numbers with the three variations of the proposed approach over the all-einsum approach as originally proposed in the question.

数据集1(相等形状的数组):

Dataset #1 (Equal shaped arrays) :

In [494]: L1 = 200
     ...: L2 = 200
     ...: L3 = 200
     ...: al = 200
     ...: 
     ...: A = np.random.rand(L1,al)
     ...: B = np.random.rand(al,L2)
     ...: C = np.random.rand(L3,al)
     ...: 

In [495]: %timeit tensor_prod_v1(A,B,C)
     ...: %timeit tensor_prod_v2(A,B,C)
     ...: %timeit tensor_prod_v3(A,B,C)
     ...: %timeit np.einsum('ia,aj,ka->ijk', A, B, C)
     ...: 
1 loops, best of 3: 470 ms per loop
1 loops, best of 3: 391 ms per loop
1 loops, best of 3: 446 ms per loop
1 loops, best of 3: 3.59 s per loop

数据集2(更大的A):

Dataset #2 (Bigger A) :

In [497]: L1 = 1000
     ...: L2 = 100
     ...: L3 = 100
     ...: al = 100
     ...: 
     ...: A = np.random.rand(L1,al)
     ...: B = np.random.rand(al,L2)
     ...: C = np.random.rand(L3,al)
     ...: 

In [498]: %timeit tensor_prod_v1(A,B,C)
     ...: %timeit tensor_prod_v2(A,B,C)
     ...: %timeit tensor_prod_v3(A,B,C)
     ...: %timeit np.einsum('ia,aj,ka->ijk', A, B, C)
     ...: 
1 loops, best of 3: 442 ms per loop
1 loops, best of 3: 355 ms per loop
1 loops, best of 3: 303 ms per loop
1 loops, best of 3: 2.42 s per loop

数据集3(大B):

In [500]: L1 = 100
     ...: L2 = 1000
     ...: L3 = 100
     ...: al = 100
     ...: 
     ...: A = np.random.rand(L1,al)
     ...: B = np.random.rand(al,L2)
     ...: C = np.random.rand(L3,al)
     ...: 

In [501]: %timeit tensor_prod_v1(A,B,C)
     ...: %timeit tensor_prod_v2(A,B,C)
     ...: %timeit tensor_prod_v3(A,B,C)
     ...: %timeit np.einsum('ia,aj,ka->ijk', A, B, C)
     ...: 
1 loops, best of 3: 474 ms per loop
1 loops, best of 3: 247 ms per loop
1 loops, best of 3: 439 ms per loop
1 loops, best of 3: 2.26 s per loop

数据集4(C较大):

In [503]: L1 = 100
     ...: L2 = 100
     ...: L3 = 1000
     ...: al = 100
     ...: 
     ...: A = np.random.rand(L1,al)
     ...: B = np.random.rand(al,L2)
     ...: C = np.random.rand(L3,al)

In [504]: %timeit tensor_prod_v1(A,B,C)
     ...: %timeit tensor_prod_v2(A,B,C)
     ...: %timeit tensor_prod_v3(A,B,C)
     ...: %timeit np.einsum('ia,aj,ka->ijk', A, B, C)
     ...: 
1 loops, best of 3: 250 ms per loop
1 loops, best of 3: 358 ms per loop
1 loops, best of 3: 362 ms per loop
1 loops, best of 3: 2.46 s per loop

数据集5(更大的a维度长度):

Dataset #5 (Bigger a-th dimension length) :

In [506]: L1 = 100
     ...: L2 = 100
     ...: L3 = 100
     ...: al = 1000
     ...: 
     ...: A = np.random.rand(L1,al)
     ...: B = np.random.rand(al,L2)
     ...: C = np.random.rand(L3,al)
     ...: 

In [507]: %timeit tensor_prod_v1(A,B,C)
     ...: %timeit tensor_prod_v2(A,B,C)
     ...: %timeit tensor_prod_v3(A,B,C)
     ...: %timeit np.einsum('ia,aj,ka->ijk', A, B, C)
     ...: 
1 loops, best of 3: 373 ms per loop
1 loops, best of 3: 269 ms per loop
1 loops, best of 3: 299 ms per loop
1 loops, best of 3: 2.38 s per loop

结论:我们看到的是 8x-10x 的提速,该提议的方法与问题中列出的all-einsum方法有所不同.

Conclusions: We are seeing a speedup of 8x-10x with the variations of the proposed approach over the all-einsum approach listed in the question.

这篇关于矩阵/张量三乘积?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆