沿轴在两个numpy数组上应用函数-形状未对齐 [英] Apply function along axis over two numpy arrays - shapes not aligned

查看:112
本文介绍了沿轴在两个numpy数组上应用函数-形状未对齐的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我可能在这里看不到明显的东西,但不相信np.apply_along_axisnp.apply_over_axes是我要寻找的东西.说我有以下两个数组:

arr1 = np.random.randn(10, 5)
arr2 = np.random.randn(10, )

以及以下功能:

def coefs(x, y):
    return np.dot(np.linalg.inv(np.dot(x.T, x)), np.dot(x.T, y))
    # the vector of coefficients in a multiple linear regression

arr1arr2上调用它可以顺利进行:

coefs(arr1, arr2)
Out[111]: array([-0.19474836, -0.50797551,  0.82903805,  0.06332607, -0.26985597])

但是,假设我有两个3d数组,而不是1或2d数组:

arr3 = np.array([arr1[:-1], arr1[1:]])
arr4 = np.array([arr2[:-1], arr2[1:]])

如预期的那样,如果我在此处应用该功能

coefs(arr3, arr4)
Traceback (most recent call last):

  File "<ipython-input-127-4a3e7df02cda>", line 1, in <module>
    coefs(arr3, arr4)

  File "<ipython-input-124-7532b8516784>", line 2, in coefs
    return np.dot(np.linalg.inv(np.dot(x.T, x)), np.dot(x.T, y))

ValueError: shapes (5,9,2) and (2,9,5) not aligned: 2 (dim 2) != 9 (dim 1)

...因为NumPy应将每个数组视为对象.我要做的是将coefs()函数沿数组的0轴逐个元素应用于2个元素中的每个元素.这是执行此操作的粗略方法:

tgt = []
for i, j in zip(arr3, arr4):
    tgt.append(coefs(i, j))

np.array(tgt) 
Out[136]: 
array([[-0.34328006, -0.99116672,  1.42757897, -0.06687851, -0.44669182],
       [ 0.44494495, -0.58017705,  0.75825944,  0.18795889,  0.4560851 ]])

我的问题是,有没有比上述的使用zip和遍历更有效,更pythonic的方式了?基本上,给定两个形状为(2,n,k)的输入数组和(2,n),我希望返回的数组的形状为(2,k).谢谢.

解决方案

对于通用形状的3D2D数组-arr3arr4,我们可以使用一些 This related post 关于np.einsumnp.dot之间的斗争值得一看.

此外,请注意,如果我们需要使用基于循环的方法,则应尝试初始化输出数组,然后将coefs的输出值分配给它,而不是附加,因为后者是一个缓慢的过程. /p>

I'm probably not seeing something obvious here but don't believe np.apply_along_axis or np.apply_over_axes is what I'm looking for. Say I have the following two arrays:

arr1 = np.random.randn(10, 5)
arr2 = np.random.randn(10, )

And the following function:

def coefs(x, y):
    return np.dot(np.linalg.inv(np.dot(x.T, x)), np.dot(x.T, y))
    # the vector of coefficients in a multiple linear regression

Calling this on arr1 and arr2 works smoothly as it should:

coefs(arr1, arr2)
Out[111]: array([-0.19474836, -0.50797551,  0.82903805,  0.06332607, -0.26985597])

However, suppose instead of the 1- or 2d arrays I have two 3d arrays:

arr3 = np.array([arr1[:-1], arr1[1:]])
arr4 = np.array([arr2[:-1], arr2[1:]])

As expected, if I apply the function here I get

coefs(arr3, arr4)
Traceback (most recent call last):

  File "<ipython-input-127-4a3e7df02cda>", line 1, in <module>
    coefs(arr3, arr4)

  File "<ipython-input-124-7532b8516784>", line 2, in coefs
    return np.dot(np.linalg.inv(np.dot(x.T, x)), np.dot(x.T, y))

ValueError: shapes (5,9,2) and (2,9,5) not aligned: 2 (dim 2) != 9 (dim 1)

...because NumPy is treating each array as an object as it should. What I want to do instead is apply the coefs() function to each of the 2 elements along the 0 axis of the arrays, element-wise. Here's a crude way of doing this:

tgt = []
for i, j in zip(arr3, arr4):
    tgt.append(coefs(i, j))

np.array(tgt) 
Out[136]: 
array([[-0.34328006, -0.99116672,  1.42757897, -0.06687851, -0.44669182],
       [ 0.44494495, -0.58017705,  0.75825944,  0.18795889,  0.4560851 ]])

My question is, is there a more efficient and pythonic way of doing this than using zip and iterating over, as above? Basically, given two input arrays of shape (2, n, k) and (2, n), I want the array that is returned to be of shape (2, k). Thanks.

解决方案

For generic shaped 3D and 2D arrays - arr3 and arr4, we can use some np.einsum magic to have a vectorized solution, like so -

dot1 = np.einsum('ijk,ijl->ikl',arr3,arr3)
dot2 = np.einsum('ijk,ij->ik',arr3,arr4)
inv1 = np.linalg.inv(dot1)
tgt_out = np.einsum('ijk,ij->ik',inv1, dot2)

Runtime test

Approaches -

def org_app(arr3, arr4):
    tgt = []
    for i, j in zip(arr3, arr4):
        tgt.append(coefs(i, j))
    return np.array(tgt)

def einsum_app(arr3, arr4):
    dot1 = np.einsum('ijk,ijl->ikl',arr3,arr3)
    dot2 = np.einsum('ijk,ij->ik',arr3,arr4)
    inv1 = np.linalg.inv(dot1)
    return np.einsum('ijk,ij->ik',inv1, dot2)

Timings and verification -

In [215]: arr3 = np.random.rand(50,50,50)
     ...: arr4 = np.random.rand(50,50)
     ...: 

In [216]: np.allclose(org_app(arr3, arr4), einsum_app(arr3, arr4))
Out[216]: True

In [217]: %timeit org_app(arr3, arr4)
100 loops, best of 3: 4.82 ms per loop

In [218]: %timeit einsum_app(arr3, arr4)
100 loops, best of 3: 19.7 ms per loop

Doesn't look like einsum is giving us any benefits here. This is expected because basically einsum is fighting it out against np.dot, which is much better at sum-reduction and even though we are using it in a loop. The only situation/case in which we can give np.dot a fight is when we loop enough and that should make einsum competitive. We are looping for times equal to the length equal of the first axis of the input arrays. Let's increase it and test again -

In [219]: arr3 = np.random.rand(1000,10,10)
     ...: arr4 = np.random.rand(1000,10)
     ...: 

In [220]: %timeit org_app(arr3, arr4)
10 loops, best of 3: 23 ms per loop

In [221]: %timeit einsum_app(arr3, arr4)
100 loops, best of 3: 9.1 ms per loop

einsum definitely winning on this one!

This related post on the fight between np.einsum and np.dot is worth a look.

Also, note that if we need to use the loop based approach, we should look to initialize the output array and then assign the output values from coefs into it rather than appending, as the latter is a slow process.

这篇关于沿轴在两个numpy数组上应用函数-形状未对齐的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆