如何在numpy行上应用泛型函数? [英] how to apply a generic function over numpy rows?

查看:264
本文介绍了如何在numpy行上应用泛型函数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在将其标记为重复之前,让我向您解释我已阅读

Before you flag this as duplicate, let me explain to you that I read this page and many others and I still haven't found a solution to my problem.

这是我遇到的问题:给定两个2D数组,我想在两个数组上应用函数F. F将两个一维数组作为输入.

This is the problem I'm having: given two 2D arrays, I want to apply a function F over the two arrays. F takes as input two 1D arrays.

import numpy as np
a = np.arange(15).reshape([3,5])
b = np.arange(30, step=2).reshape([3,5])

# what is the 'numpy' equivalent of the following?
np.array([np.dot(x,y) for x,y in zip(a,b)])

请注意,np.dot仅用于演示.真正的问题是,可以在两组一维数组上工作的泛型函数F.

Please note that np.dot is just for demonstration. The real question here is any generic function F that works over two sets of 1D arrays.

  • 向量化要么完全失败,要么出错,或者应用逐个元素的功能,而不是逐个数组(或逐行)
  • np.apply_along_axis迭代地应用该函数;例如,使用上面定义的变量,它执行F(a[0], b[0])并将其与F(a[0], b[1])F(a[0], b[2])组合.这不是我要找的东西.理想情况下,我希望它仅在F(a[0], b[0])
  • 处停止
  • 索引切片/高级切片也不符合我的要求.例如,如果我做类似np.dot(a[np.arange(3)], b[np.arange(3)])的操作,则会抛出ValueError,表示形状(3,5)和(3,5)没有对齐.我不知道该如何解决.
  • vectorizing either fails outright with an error or it applies the function element-by-element, instead of array-by-array (or row-by-row)
  • np.apply_along_axis applies the function iteratively; for example, using the variables defined above, it does F(a[0], b[0]) and combines this with F(a[0], b[1]) and F(a[0], b[2]). This is not what I'm looking for. Ideally, I would want it to stop at just F(a[0], b[0])
  • index slicing / advanced slicing doesn't do what I would like either. For one, if I do something like np.dot(a[np.arange(3)], b[np.arange(3)]) this throws a ValueError saying that shapes (3,5) and (3,5) are not aligned. I don't know how to fix this.

我试图以任何可能的方式解决此问题,但是我想出的唯一解决方案是使用列表理解.但是我担心使用列表理解会导致性能损失.如果可能,我想使用numpy操作达到相同的效果.我该怎么做?

I tried to solve this in any way I could, but the only solution I've come up with that works is using list comprehension. But I'm worried about the cost to performance as a result of using list comprehension. I would like to achieve the same effect using a numpy operation, if possible. How do I do this?

推荐答案

此类问题在SO上已被淘汰,但我将尝试说明您的框架存在的问题:

This type of question has been beat to death on SO, but I'll try to illustrate the issues with your framework:

In [1]: a = np.arange(15).reshape([3,5])
   ...: b = np.arange(30, step=2).reshape([3,5])
   ...: 
In [2]: def f(x,y):
   ...:     return np.dot(x,y)

压缩理解力

列表理解方法将f应用于ab的3行.也就是说,像遍历列表一样在2个数组上进行迭代.每次调用时,您的函数将获得2个1d数组. dot可以接受其他形状,但目前我们假设它仅适用于一对1ds

zipped comprehension

The list comprehension approach applies f to the 3 rows of a and b. That is, it iterates on the 2 arrays as through they were lists. At each call, your function gets 2 1d arrays. dot can accept other shapes, but for the moment we'll pretend that it only works with a pair of 1ds

In [3]: np.array([f(x,y) for x,y in zip(a,b)])
Out[3]: array([  60,  510, 1460])
In [4]: np.dot(a[0],b[0])
Out[4]: 60

vectorize/frompyfunc

np.vectorize迭代输入(使用广播-可能很方便),并给出函数标量值.我将用frompyfunc说明返回一个对象dtype数组(并由vectorize使用):

vectorize/frompyfunc

np.vectorize iterates over the inputs (with broadcasting - which can be handy), and gives the function scalar values. I'll illustrate with frompyfunc returns a object dtype array (and is used by vectorize):

In [5]: vf = np.frompyfunc(f, 2,1)
In [6]: vf(a,b)
Out[6]: 
array([[0, 2, 8, 18, 32],
       [50, 72, 98, 128, 162],
       [200, 242, 288, 338, 392]], dtype=object)

所以结果是(3,5)数组;偶然地跨列求和就得到了预期的结果

So the result is (3,5) array; incidentally summing across columns gets the desired result

In [9]: vf(a,b).sum(axis=1)
Out[9]: array([60, 510, 1460], dtype=object)

np.vectorize没有做出任何速度承诺.

np.vectorize does not make any speed promises.

我不知道您如何尝试使用apply_along_axis.它只需要一个数组.经过大量设置后,最终完成了操作(对于像a这样的2d数组):

I don't know how you tried to use apply_along_axis. It only takes one array. After a lot of set up it ends up doing (for a 2d array like a):

for i in range(3):
    idx = (i, slice(None))
    outarr[idx] = asanyarray(func1d(arr[idx], *args, **kwargs))

对于3d及更大尺寸,它使在其他"轴上的迭代更加简单;对于2d来说,这是过分的杀伤力.无论如何,它不会加快计算速度.它仍然是迭代.

For 3d and larger it makes iteration over the 'other' axes simpler; for 2d it is overkill. In any case it does not speed up the calculations. It is still iteration.

(apply_along_axis使用arr*args.在arr上进行迭代,但整体使用*args.)

(apply_along_axis takes arr and *args. It iterates on arr, but uses *args whole.).

np.dot(a[np.arange(3)], b[np.arange(3)])

np.dot(a, b)

dot是矩阵乘积,(3,5)与(5,3)一起产生(3,3).它将1d作为特殊情况处理(请参阅文档),(3)与(3,)产生(3,).

dot is matrix product, (3,5) works with (5,3) to produce a (3,3). It handles 1d as a special case (see docs), (3,) with (3,) produces (3,).

对于真正通用的f(x,y),您唯一的压缩列表理解的替代方法是像这样的索引循环:

For a truly generic f(x,y), your only alternative to the zipped list comprehension is an index loop like this:

In [18]: c = np.zeros((a.shape[0]))
In [19]: for i in range(a.shape[0]):
    ...:    c[i] = f(a[i,:], b[i,:])
In [20]: c
Out[20]: array([   60.,   510.,  1460.])

速度将相似. (该动作可以通过cython移至已编译的代码,但我认为您不准备深入其中.)

Speed will be similar. (that action can be moved to compiled code with cython, but I don't think you are ready to dive in that deep.)

如注释中所述,如果数组为(N,M),并且NM相比较小,则此迭代的成本并不高.也就是说,完成一个大任务的几个循环是可以的.如果简化大型阵列内存管理,它们甚至可能更快.

As noted in a comment, if the arrays are (N,M), and N is small compared to M, this iteration is not costly. That is, a few loops over a big task are ok. They may even be faster if they simplify large array memory management.

理想的解决方案是使用numpy编译函数重写通用函数,使其可用于2d数组.

The ideal solution is to rewrite the generic function so it works with 2d arrays, using numpy compilied functions.

在矩阵乘法的情况下,einsum在编译后的代码中实现了乘积和"的广义形式:

In the matrix multiplication case, einsum has implemented a generalized form of 'sum-of-products' in compiled code:

In [22]: np.einsum('ij,ij->i',a,b)
Out[22]: array([  60,  510, 1460])

matmul也可以推广该产品,但最适合3d阵列:

matmul also generalizes the product, but works best with 3d arrays:

In [25]: a[:,None,:]@b[:,:,None]    # needs reshape
Out[25]: 
array([[[  60]],

       [[ 510]],

       [[1460]]])

这篇关于如何在numpy行上应用泛型函数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆