矩阵乘法与迭代器的依赖 - numpy的 [英] Matrix multiplication with iterator dependency - NumPy

查看：217 发布时间：2016/6/1 19:58:46 python arrays performance numpy multiplication

本文介绍了矩阵乘法与迭代器的依赖 - numpy的的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

有时回<一个href=\"http://stackoverflow.com/questions/36042556/numpy-multiplication-anything-more-than-tensordot\"><$c$c>this问题（现已删除，但10K +代表用户仍然可以观看）被张贴。它看起来有趣，我和我学到新的东西出现，而试图解决它，我觉得这是值得分享。我想张贴这些想法/解决方案，并希望能看到人岗其他可能的方式来解决这个问题。我张贴下一问题的要点。

所以，我们有两个numpy的ndarrays A 和 B 形状：

  A：（M，N，N）
A：（N，M，N）

让我们假设我们正在处理，其中 M ， N ＆放例; N 具有可比性。

问题是要解决以下乘法和求和，重点是性能

  DEF all_loopy（A，B）：
    P，Q，N = a.shape
    D = np.zeros（N）
    因为我在范围（N）：
        在范围（一）记者：
            对于k在范围（P）：在
                在范围（Q）N：
                    D [I] + =一个[K，N，I] * B [N，K，j]的
    ð回报

解决方案

我学到了一些东西一起试图找到矢量更快的方式来解决它的方式。

1）首先，有迭代器在的依赖的j范围（I）。从我的previous的经验，尤其是在试图解决关于 MATLAB 这样的问题，就出现了这样的相关性可以用一个的 下三角矩阵 ，这样的 np.tril 应在那里工作。因此，一个完全量化的解决方案并没有那么内存有效的解决方案（因为它创造了一个中间（N，N）最后才降低到形阵列（N ，）形阵列）是 -

  DEF fully_vectorized（A，B）：
    返回np.tril（np.einsum（'IJK，jil-＆GT; KL'，A，B）， -  1）的.sum（1）

2）下招/想法是保留一个循环为的迭代器 I 为我的range（N），但插入一个索引的依赖和使用 np.einsum 来执行所有的乘法和求和。的优点将是存储器效率。实施应该是这样的 -

  DEF einsum_oneloop（A，B）：
    D = np.zeros（N）
    因为我在范围（N）：
        D [I] = np.einsum（'IJ，jik-＆GT;'，一个[：，：，我]，B [：，：，np.arange（ⅰ）]）
    ð回报

有两种解决这个问题更明显的方式。所以，如果我们开始从原来的 all_loopy 解决方案时，我们可以保持外两个回路，并使用 np.einsum 或 np.tensordot 来执行这些操作，从而取下内两个循环，像这样 -

  DEF tensordot_twoloop（A，B）：
    D = np.zeros（N）
    因为我在范围（N）：
        在范围（一）记者：
            D [I] + = np.tensordot（一个[：，：，我]，B [：，：，j]时，轴=（[1,0]，[0,1]））
    ð回报DEF einsum_twoloop（A，B）：
    D = np.zeros（N）
    因为我在范围（N）：
        在范围（一）记者：
            D [I] + = np.einsum（'IJ，霁＆GT;'，一个[：，：，我]，B [：，：，j]的）
    ð回报

运行测试

让我们比较所有的五种方法迄今发布到解决问题，其中包括一张贴在的问题。

案例＃1：

 在[26]：＃输入阵列，随机元素
    ...：M，N，N = 20,20,20
    ...：A = np.random.rand（M，N，N）
    ...：B = np.random.rand（N，M，N）
    ...：在[27]：％timeit all_loopy（A，B）
    ...：％timeit tensordot_twoloop（A，B）
    ...：％timeit einsum_twoloop（A，B）
    ...：％timeit einsum_oneloop（A，B）
    ...：％timeit fully_vectorized（A，B）
    ...：
10圈，最好的3：每循环79.6毫秒
100圈，最好的3：每循环4.97毫秒
1000循环，最好的3：每循环1.66毫秒
1000次循环，最好的3：每回路585微秒
1000循环，最好的3：每圈684微秒

案例＃2：

 在[28]：＃输入阵列，随机元素
    ...：M，N，N = 50,50,50
    ...：A = np.random.rand（M，N，N）
    ...：B = np.random.rand（N，M，N）
    ...：在[29]：％timeit all_loopy（A，B）
    ...：％timeit tensordot_twoloop（A，B）
    ...：％timeit einsum_twoloop（A，B）
    ...：％timeit einsum_oneloop（A，B）
    ...：％timeit fully_vectorized（A，B）
    ...：
1循环，最好的3：每环3.1小号
10圈，最好的3：每循环54.1毫秒
10圈，最好的3：每循环26.2毫秒
10圈，最好的3：每循环27毫秒
10圈，最好的3：每循环23.3毫秒

案例＃3（离开了all_loopy为是非常慢）：

 在[30]：＃输入阵列，随机元素
    ...：M，N，N = 100,100,100
    ...：A = np.random.rand（M，N，N）
    ...：B = np.random.rand（N，M，N）
    ...：在[31]：％timeit tensordot_twoloop（A，B）
    ...：％timeit einsum_twoloop（A，B）
    ...：％timeit einsum_oneloop（A，B）
    ...：％timeit fully_vectorized（A，B）
    ...：
1循环，最好的3：每循环1.08小号
1循环，最好的3：每循环744毫秒
1循环，最好的3：每循环568毫秒
1循环，最好的3：每循环866毫秒

用数字去， einsum_oneloop 看起来pretty对我好，而 fully_vectorized 可时使用处理小体面大小的数组！

Sometime back this question (now deleted but 10K+ rep users can still view it) was posted. It looked interesting to me and I learnt something new there while trying to solve it and I thought that's worth sharing. I would like to post those ideas/solutions and would love to see people post other possible ways to solve it. I am posting the gist of the question next.

So, we have two NumPy ndarrays a and b of shapes :

a : (m,n,N)
b : (n,m,N)

Let's assume we are dealing with cases where m,n & N are comparable.

The problem is to solve the following multiplication and summation with focus on performance :

def all_loopy(a,b):
    P,Q,N = a.shape
    d = np.zeros(N)
    for i in range(N):
        for j in range(i):
            for k in range(P):
                for n in range(Q):
                    d[i] += a[k,n,i] * b[n,k,j]
    return d

解决方案

I learnt few things along the way trying to find vectorized and faster ways to solve it.

1) First off, there is a dependency of iterators at "for j in range(i)". From my previous experience, especially with trying to solve such problems on MATLAB, it appeared that such dependency could be taken care of with a lower triangular matrix, so np.tril should work there. Thus, a fully vectorized solution and not so memory efficient solution (as it creates an intermediate (N,N) shaped array before finally reducing to (N,) shaped array) would be -

def fully_vectorized(a,b):
    return np.tril(np.einsum('ijk,jil->kl',a,b),-1).sum(1)

2) Next trick/idea was to keep one loop for the iterator i in for i in range(N), but insert that dependency with indexing and using np.einsum to perform all those multiplications and summations. The advantage would be memory-efficiency. The implementation would look like this -

def einsum_oneloop(a,b):
    d = np.zeros(N)
    for i in range(N):
        d[i] = np.einsum('ij,jik->',a[:,:,i],b[:,:,np.arange(i)])
    return d

There are two more obvious ways to solve it. So, if we start working from the original all_loopy solution, one could keep the outer two loops and use np.einsum or np.tensordot to perform those operations and thus remove the inner two loops, like so -

def tensordot_twoloop(a,b):
    d = np.zeros(N)
    for i in range(N):
        for j in range(i):
            d[i] += np.tensordot(a[:,:,i],b[:,:,j], axes=([1,0],[0,1]))        
    return d

def einsum_twoloop(a,b):
    d = np.zeros(N)
    for i in range(N):
        for j in range(i):
            d[i] += np.einsum('ij,ji->',a[:,:,i],b[:,:,j])
    return d

Runtime test

Let's compare all the five approaches posted thus far to solve the problem, including the one posted in the question.

Case #1 :

In [26]: # Input arrays with random elements
    ...: m,n,N = 20,20,20
    ...: a = np.random.rand(m,n,N)
    ...: b = np.random.rand(n,m,N)
    ...: 

In [27]: %timeit all_loopy(a,b)
    ...: %timeit tensordot_twoloop(a,b)
    ...: %timeit einsum_twoloop(a,b)
    ...: %timeit einsum_oneloop(a,b)
    ...: %timeit fully_vectorized(a,b)
    ...: 
10 loops, best of 3: 79.6 ms per loop
100 loops, best of 3: 4.97 ms per loop
1000 loops, best of 3: 1.66 ms per loop
1000 loops, best of 3: 585 µs per loop
1000 loops, best of 3: 684 µs per loop

Case #2 :

In [28]: # Input arrays with random elements
    ...: m,n,N = 50,50,50
    ...: a = np.random.rand(m,n,N)
    ...: b = np.random.rand(n,m,N)
    ...: 

In [29]: %timeit all_loopy(a,b)
    ...: %timeit tensordot_twoloop(a,b)
    ...: %timeit einsum_twoloop(a,b)
    ...: %timeit einsum_oneloop(a,b)
    ...: %timeit fully_vectorized(a,b)
    ...: 
1 loops, best of 3: 3.1 s per loop
10 loops, best of 3: 54.1 ms per loop
10 loops, best of 3: 26.2 ms per loop
10 loops, best of 3: 27 ms per loop
10 loops, best of 3: 23.3 ms per loop

Case #3 (Leaving out all_loopy for being very slow) :

In [30]: # Input arrays with random elements
    ...: m,n,N = 100,100,100
    ...: a = np.random.rand(m,n,N)
    ...: b = np.random.rand(n,m,N)
    ...: 

In [31]: %timeit tensordot_twoloop(a,b)
    ...: %timeit einsum_twoloop(a,b)
    ...: %timeit einsum_oneloop(a,b)
    ...: %timeit fully_vectorized(a,b)
    ...: 
1 loops, best of 3: 1.08 s per loop
1 loops, best of 3: 744 ms per loop
1 loops, best of 3: 568 ms per loop
1 loops, best of 3: 866 ms per loop

Going by the numbers, einsum_oneloop looks pretty good to me, whereas fully_vectorized could be used when dealing with small to decent sized arrays!

这篇关于矩阵乘法与迭代器的依赖 - numpy的的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

矩阵乘法与迭代器的依赖 - numpy的 [英] Matrix multiplication with iterator dependency - NumPy

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

矩阵乘法与迭代器的依赖 - numpy的 [英] Matrix multiplication with iterator dependency - NumPy

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭