为什么numpy的克朗这么快? [英] Why is numpy's kron so fast?
问题描述
我正在尝试实现 kronecker产品功能.以下是我的三个想法:
I was trying to implement a kronecker product function. Below are three ideas that I have:
def kron(arr1, arr2):
"""columnwise outer product, avoiding relocate elements.
"""
r1, c1 = arr1.shape
r2, c2 = arr2.shape
nrows, ncols = r1 * r2, c1 * c2
res = np.empty((nrows, ncols))
for idx1 in range(c1):
for idx2 in range(c2):
new_c = idx1 * c2 + idx2
temp = np.zeros((r2, r1))
temp_kron = scipy.linalg.blas.dger(
alpha=1.0, x=arr2[:, idx2], y=arr1[:, idx1], incx=1, incy=1,
a=temp)
res[:, new_c] = np.ravel(temp_kron, order='F')
return res
def kron2(arr1, arr2):
"""First outer product, then rearrange items.
"""
r1, c1 = arr1.shape
r2, c2 = arr2.shape
nrows, ncols = r1 * r2, c1 * c2
tmp = np.outer(arr2, arr1)
res = np.empty((nrows, ncols))
for idx in range(arr1.size):
for offset in range(c2):
orig = tmp[offset::c2, idx]
dest_coffset = idx % c1 * c2 + offset
dest_roffset = (idx // c1) * r2
res[dest_roffset:dest_roffset+r2, dest_coffset] = orig
return res
def kron3(arr1, arr2):
"""First outer product, then rearrange items.
"""
r1, c1 = arr1.shape
r2, c2 = arr2.shape
nrows, ncols = r1 * r2, c1 * c2
tmp = np.outer(np.ravel(arr2, 'F'), np.ravel(arr1, 'F'))
res = np.empty((nrows, ncols))
for idx in range(arr1.size):
for offset in range(c2):
orig_offset = offset * r2
orig = tmp[orig_offset:orig_offset+r2, idx]
dest_c = idx // r1 * c2 + offset
dest_r = idx % r1 * r2
res[dest_r:dest_r+r2, dest_c] = orig
return res
基于此 stackoverflow 帖子,我创建了一个MeasureTime装饰器.一个自然的基准将是与 numpy.kron .下面是我的测试功能:
Based on this stackoverflow post I created a MeasureTime decorator. A natural benchmark would be to compare against numpy.kron. Below are my test functions:
@MeasureTime
def test_np_kron(arr1, arr2, number=1000):
for _ in range(number):
np.kron(arr1, arr2)
return
@MeasureTime
def test_kron(arr1, arr2, number=1000):
for _ in range(number):
kron(arr1, arr2)
@MeasureTime
def test_kron2(arr1, arr2, number=1000):
for _ in range(number):
kron2(arr2, arr1)
@MeasureTime
def test_kron3(arr1, arr2, number=1000):
for _ in range(number):
kron3(arr2, arr1)
事实证明Numpy的kron函数性能要好得多:
Turned out that Numpy's kron function performances much better:
arr1 = np.array([[1,-4,7], [-2, 3, 3]], dtype=np.float64, order='F')
arr2 = np.array([[8, -9, -6, 5], [1, -3, -4, 7], [2, 8, -8, -3], [1, 2, -5, -1]], dtype=np.float64, order='F')
In [243]: test_np_kron(arr1, arr2, number=10000)
Out [243]: "test_np_kron": 0.19688990000577178s
In [244]: test_kron(arr1, arr2, number=10000)
Out [244]: "test_kron": 0.6094115000014426s
In [245]: test_kron2(arr1, arr2, number=10000)
Out [245]: "test_kron2": 0.5699560000066413s
In [246]: test_kron3(arr1, arr2, number=10000)
Out [246]: "test_kron3": 0.7134822000080021s
我想知道为什么会这样吗?那是因为Numpy的重塑方法比手动复制内容(尽管仍然使用numpy)具有更高的性能?我很困惑,因为否则我也在使用np.outer/blas.dger.我在这里认识到的唯一区别是我们如何安排最终结果. NumPy的重塑效果如何?
I would like to know why that is the case? Is that because Numpy's reshape method is much more performant than manually copying over stuff (although still using numpy)? I was puzzled, since otherwise, I was using np.outer / blas.dger as well. The only difference I recognized here was how we arranged the ending results. How come NumPy's reshape perform this good?
此处是NumPy 1.17 kron来源的链接.
Here is the link to NumPy 1.17 kron source.
更新: 首先忘了提到我试图在python中进行原型制作,然后使用Cblas/lapack和C ++来实现kron.是否有一些现有的克朗"需要重构.然后,我遇到了Numpy的 reshape ,并对此印象深刻.
Updates: Forgot to mention in the first place that I was trying to prototype in python, and then implement kron using C++ with cblas/lapack. Had some existing 'kron' needing to be refactored. I then came across Numpy's reshape and got really impressed.
提前感谢您的时间!
推荐答案
让我们尝试2个小数组:
Let's experiment with 2 small arrays:
In [124]: A, B = np.array([[1,2],[3,4]]), np.array([[10,11],[12,13]])
kron
产生:
In [125]: np.kron(A,B)
Out[125]:
array([[10, 11, 20, 22],
[12, 13, 24, 26],
[30, 33, 40, 44],
[36, 39, 48, 52]])
outer
产生相同的数字,但排列不同:
outer
produces the same numbers, but with a different arangement:
In [126]: np.outer(A,B)
Out[126]:
array([[10, 11, 12, 13],
[20, 22, 24, 26],
[30, 33, 36, 39],
[40, 44, 48, 52]])
kron
将其重塑为A
和B
的形状的组合:
kron
reshapes it to a combination of the shapes of A
and B
:
In [127]: np.outer(A,B).reshape(2,2,2,2)
Out[127]:
array([[[[10, 11],
[12, 13]],
[[20, 22],
[24, 26]]],
[[[30, 33],
[36, 39]],
[[40, 44],
[48, 52]]]])
然后使用concatenate
将4个维度重新组合为2个:
it then recombines 4 dimensions into 2 with concatenate
:
In [128]: np.concatenate(np.concatenate(_127, 1),1)
Out[128]:
array([[10, 11, 20, 22],
[12, 13, 24, 26],
[30, 33, 40, 44],
[36, 39, 48, 52]])
另一种方法是交换轴并重塑形状:
An alternative is to swap axes, and reshape:
In [129]: _127.transpose(0,2,1,3).reshape(4,4)
Out[129]:
array([[10, 11, 20, 22],
[12, 13, 24, 26],
[30, 33, 40, 44],
[36, 39, 48, 52]])
第一次重塑和转置产生一个视图,但是第二次重塑必须产生一个副本.串联将创建一个副本.但是所有这些动作都是在已编译的numpy
代码中完成的.
The first reshape and transpose produce a view, but the second reshape has to produce a copy. Concatenate makes a copy. But all those actions are done in compiled numpy
code.
定义功能:
def foo1(A,B):
temp = np.outer(A,B)
temp = temp.reshape(A.shape + B.shape)
return np.concatenate(np.concatenate(temp, 1), 1)
def foo2(A,B):
temp = np.outer(A,B)
nz = temp.shape
temp = temp.reshape(A.shape + B.shape)
return temp.transpose(0,2,1,3).reshape(nz)
测试:
In [141]: np.allclose(np.kron(A,B), foo1(A,B))
Out[141]: True
In [142]: np.allclose(np.kron(A,B), foo2(A,B))
Out[142]: True
时间:
In [143]: timeit np.kron(A,B)
42.4 µs ± 294 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [145]: timeit foo1(A,B)
26.3 µs ± 38.6 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [146]: timeit foo2(A,B)
13.8 µs ± 19.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
我的代码可能需要一些概括,但是它证明了该方法的有效性.
My code may need some generalization, but it demonstrates the validity of the approach.
===
使用您的kron
:
In [150]: kron(A,B)
Out[150]:
array([[10., 11., 20., 22.],
[12., 13., 24., 26.],
[30., 33., 40., 44.],
[36., 39., 48., 52.]])
In [151]: timeit kron(A,B)
55.3 µs ± 1.59 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
这篇关于为什么numpy的克朗这么快?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!