为什么B = numpy.dot(A,x)通过执行B [i,:,:] = numpy.dot(A [i,:,:],x))如此慢得多的循环? [英] Why is B = numpy.dot(A,x) so much slower looping through doing B[i,:,:] = numpy.dot(A[i,:,:],x) )?

查看:95
本文介绍了为什么B = numpy.dot(A,x)通过执行B [i,:,:] = numpy.dot(A [i,:,:],x))如此慢得多的循环?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我得到了一些我无法解释的效率测试结果.

I'm getting some efficiency test results that I can't explain.

我想组装一个矩阵B,它的第i个条目B [i,:::] = A [i,:,:].dot(x),其中每个A [i,:::]是一个2D矩阵,x也是.

I want to assemble a matrix B whose i-th entries B[i,:,:] = A[i,:,:].dot(x), where each A[i,:,:] is a 2D matrix, and so is x.

我可以通过三种方法来执行此操作,以测试性能,我使随机(numpy.random.randn)矩阵A =(10,1000,1000),x =(1000,1200).我得到以下时间结果:

I can do this three ways, to test performance I make random (numpy.random.randn) matrices A = (10,1000,1000), x = (1000,1200). I get the following time results:

(1)单个多维点积

B = A.dot(x)

total time: 102.361 s

(2)遍历i并执行2D点积

(2) looping through i and performing 2D dot products

   # initialize B = np.zeros([dim1, dim2, dim3])
   for i in range(A.shape[0]):
       B[i,:,:] = A[i,:,:].dot(x)

total time: 0.826 s

(3)numpy.einsum

(3) numpy.einsum

B3 = np.einsum("ijk, kl -> ijl", A, x)

total time: 8.289 s

因此,选项(2)是迄今为止最快的.但是,仅考虑(1)和(2),我看不出它们之间的巨大差异.如何遍历和处理2D点积快约124倍?他们都使用numpy.dot.有什么见识吗?

So, option (2) is the fastest by far. But, considering just (1) and (2), I don't see the big difference between them. How can looping through and doing 2D dot products be ~ 124 times faster? They both use numpy.dot. Any insights?

我将用于以上结果的代码包括在下面:

I include the code used for the above results just below:

import numpy as np
import numpy.random as npr
import time

dim1, dim2, dim3 = 10, 1000, 1200
A = npr.randn(dim1, dim2, dim2)
x = npr.randn(dim2, dim3)

# consider three ways of assembling the same matrix B: B1, B2, B3

t = time.time()
B1 = np.dot(A,x)
td1 = time.time() - t
print "a single dot product of A [shape = (%d, %d, %d)] with x [shape = (%d, %d)] completes in %.3f s" \
  % (A.shape[0], A.shape[1], A.shape[2], x.shape[0], x.shape[1], td1)


B2 = np.zeros([A.shape[0], x.shape[0], x.shape[1]])
t = time.time()
for i in range(A.shape[0]):
    B2[i,:,:] = np.dot(A[i,:,:], x)
td2 = time.time() - t
print "taking %d dot products of 2D dot products A[i,:,:] [shape = (%d, %d)] with x [shape = (%d, %d)] completes in %.3f s" \
  % (A.shape[0], A.shape[1], A.shape[2], x.shape[0], x.shape[1], td2)

t = time.time()
B3 = np.einsum("ijk, kl -> ijl", A, x)
td3 = time.time() - t
print "using np.einsum, it completes in %.3f s" % td3

推荐答案

使用较小的10,100,200暗淡,我得到相似的排名

With smaller dims 10,100,200, I get a similar ranking

In [355]: %%timeit
   .....: B=np.zeros((N,M,L))
   .....: for i in range(N):
              B[i,:,:]=np.dot(A[i,:,:],x)
   .....: 
10 loops, best of 3: 22.5 ms per loop
In [356]: timeit np.dot(A,x)
10 loops, best of 3: 44.2 ms per loop
In [357]: timeit np.einsum('ijk,km->ijm',A,x)
10 loops, best of 3: 29 ms per loop

In [367]: timeit np.dot(A.reshape(-1,M),x).reshape(N,M,L)
10 loops, best of 3: 22.1 ms per loop

In [375]: timeit np.tensordot(A,x,(2,0))
10 loops, best of 3: 22.2 ms per loop

迭代的速度更快,尽管速度不如您所见.

the itererative is faster, though not by as much as in your case.

只要迭代维数比其他维数小,这可能是正确的.在这种情况下,与计算时间相比,迭代(函数调用等)的开销较小.一次执行所有值会占用更多内存.

This is probably true as long as that iterating dimension is small compared to the other ones. In that case the overhead of iteration (function calls etc) is small compared to the calculation time. And doing all the values at once uses more memory.

我尝试了dot变体,将A重塑为2d,以为dot在内部进行了这种重塑.我很惊讶它实际上是最快的. tensordot可能正在执行相同的重塑(如果Python可读,则该代码).

I tried a dot variation where I reshaped A into 2d, thinking that dot does that kind of reshaping internally. I'm surprised that it is actually fastest. tensordot is probably doing the same reshaping (that code if Python readable).

einsum设置涉及四个变量(i,j,k,m)的产品总和"迭代,即C级nditerdim1*dim2*dim2*dim3步骤.因此,索引越多,迭代空间就越大.

einsum sets up a 'sum of products' iteration involving 4 variables, the i,j,k,m - that is dim1*dim2*dim2*dim3 steps with the C level nditer. So the more indices you have the larger the iteration space.

这篇关于为什么B = numpy.dot(A,x)通过执行B [i,:,:] = numpy.dot(A [i,:,:],x))如此慢得多的循环?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆