为什么 B = numpy.dot(A,x) 通过执行 B[i,:,:] = numpy.dot(A[i,:,:],x) ) 循环慢得多? [英] Why is B = numpy.dot(A,x) so much slower looping through doing B[i,:,:] = numpy.dot(A[i,:,:],x) )?

查看:41
本文介绍了为什么 B = numpy.dot(A,x) 通过执行 B[i,:,:] = numpy.dot(A[i,:,:],x) ) 循环慢得多?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我得到了一些无法解释的效率测试结果.

I'm getting some efficiency test results that I can't explain.

我想组装一个矩阵 B,其第 i 个条目 B[i,:,:] = A[i,:,:].dot(x),其中每个 A[i,:,:] 是一个二维矩阵,x 也是.

I want to assemble a matrix B whose i-th entries B[i,:,:] = A[i,:,:].dot(x), where each A[i,:,:] is a 2D matrix, and so is x.

我可以通过三种方式来测试性能,我制作随机 (numpy.random.randn) 矩阵 A = (10,1000,1000), x = (1000,1200).我得到以下时间结果:

I can do this three ways, to test performance I make random (numpy.random.randn) matrices A = (10,1000,1000), x = (1000,1200). I get the following time results:

(1) 单个多维点积

B = A.dot(x)

total time: 102.361 s

(2) 遍历 i 并执行二维点积

(2) looping through i and performing 2D dot products

   # initialize B = np.zeros([dim1, dim2, dim3])
   for i in range(A.shape[0]):
       B[i,:,:] = A[i,:,:].dot(x)

total time: 0.826 s

(3) numpy.einsum

(3) numpy.einsum

B3 = np.einsum("ijk, kl -> ijl", A, x)

total time: 8.289 s

因此,选项 (2) 是迄今为止最快的.但是,仅考虑(1)和(2),我看不出它们之间有什么大的区别.循环和执行 2D 点积如何能快 124 倍?他们都使用 numpy.dot.有什么见解吗?

So, option (2) is the fastest by far. But, considering just (1) and (2), I don't see the big difference between them. How can looping through and doing 2D dot products be ~ 124 times faster? They both use numpy.dot. Any insights?

我在下面包含了用于上述结果的代码:

I include the code used for the above results just below:

import numpy as np
import numpy.random as npr
import time

dim1, dim2, dim3 = 10, 1000, 1200
A = npr.randn(dim1, dim2, dim2)
x = npr.randn(dim2, dim3)

# consider three ways of assembling the same matrix B: B1, B2, B3

t = time.time()
B1 = np.dot(A,x)
td1 = time.time() - t
print "a single dot product of A [shape = (%d, %d, %d)] with x [shape = (%d, %d)] completes in %.3f s" 
  % (A.shape[0], A.shape[1], A.shape[2], x.shape[0], x.shape[1], td1)


B2 = np.zeros([A.shape[0], x.shape[0], x.shape[1]])
t = time.time()
for i in range(A.shape[0]):
    B2[i,:,:] = np.dot(A[i,:,:], x)
td2 = time.time() - t
print "taking %d dot products of 2D dot products A[i,:,:] [shape = (%d, %d)] with x [shape = (%d, %d)] completes in %.3f s" 
  % (A.shape[0], A.shape[1], A.shape[2], x.shape[0], x.shape[1], td2)

t = time.time()
B3 = np.einsum("ijk, kl -> ijl", A, x)
td3 = time.time() - t
print "using np.einsum, it completes in %.3f s" % td3

推荐答案

使用较小的暗度 10,100,200,我得到了类似的排名

With smaller dims 10,100,200, I get a similar ranking

In [355]: %%timeit
   .....: B=np.zeros((N,M,L))
   .....: for i in range(N):
              B[i,:,:]=np.dot(A[i,:,:],x)
   .....: 
10 loops, best of 3: 22.5 ms per loop
In [356]: timeit np.dot(A,x)
10 loops, best of 3: 44.2 ms per loop
In [357]: timeit np.einsum('ijk,km->ijm',A,x)
10 loops, best of 3: 29 ms per loop

In [367]: timeit np.dot(A.reshape(-1,M),x).reshape(N,M,L)
10 loops, best of 3: 22.1 ms per loop

In [375]: timeit np.tensordot(A,x,(2,0))
10 loops, best of 3: 22.2 ms per loop

迭代速度更快,但不如您的情况快.

the itererative is faster, though not by as much as in your case.

只要迭代维度与其他维度相比较小,这可能是正确的.在这种情况下,迭代(函数调用等)的开销与计算时间相比很小.一次处理所有值会使用更多内存.

This is probably true as long as that iterating dimension is small compared to the other ones. In that case the overhead of iteration (function calls etc) is small compared to the calculation time. And doing all the values at once uses more memory.

我尝试了一个 dot 变体,我将 A 重塑为 2d,认为 dot 在内部进行了这种重塑.我很惊讶它实际上是最快的.tensordot 可能正在做相同的重塑(如果 Python 可读,则该代码).

I tried a dot variation where I reshaped A into 2d, thinking that dot does that kind of reshaping internally. I'm surprised that it is actually fastest. tensordot is probably doing the same reshaping (that code if Python readable).

einsum 设置了一个涉及 4 个变量的乘积总和"迭代,即 i,j,k,m - 即 dim1*dim2*dim2*dim3 使用 C 级 nditer 进行步骤.因此,您拥有的索引越多,迭代空间就越大.

einsum sets up a 'sum of products' iteration involving 4 variables, the i,j,k,m - that is dim1*dim2*dim2*dim3 steps with the C level nditer. So the more indices you have the larger the iteration space.

这篇关于为什么 B = numpy.dot(A,x) 通过执行 B[i,:,:] = numpy.dot(A[i,:,:],x) ) 循环慢得多?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆