如何在 numpy 行上应用通用函数? [英] how to apply a generic function over numpy rows?
问题描述
在您将其标记为重复之前,让我向您解释我阅读了此页面 和许多其他页面,但我仍未找到解决问题的方法.
Before you flag this as duplicate, let me explain to you that I read this page and many others and I still haven't found a solution to my problem.
这是我遇到的问题:给定两个二维数组,我想在两个数组上应用函数 F.F 将两个一维数组作为输入.
This is the problem I'm having: given two 2D arrays, I want to apply a function F over the two arrays. F takes as input two 1D arrays.
import numpy as np
a = np.arange(15).reshape([3,5])
b = np.arange(30, step=2).reshape([3,5])
# what is the 'numpy' equivalent of the following?
np.array([np.dot(x,y) for x,y in zip(a,b)])
请注意,np.dot
仅用于演示.这里真正的问题是任何适用于两组一维数组的泛型函数 F.
Please note that np.dot
is just for demonstration. The real question here is any generic function F that works over two sets of 1D arrays.
- 向量化要么完全失败并出现错误,要么逐个元素地应用函数,而不是逐个数组(或逐行)
np.apply_along_axis
迭代地应用函数;例如,使用上面定义的变量,它会执行F(a[0], b[0])
并将其与F(a[0], b[1])
和F(a[0], b[2])
.这不是我要找的.理想情况下,我希望它停在F(a[0], b[0])
- 索引切片/高级切片也不符合我的意愿.一方面,如果我做类似
np.dot(a[np.arange(3)], b[np.arange(3)])
的事情,这会抛出一个 ValueError 说形状 (3,5) 和 (3,5) 没有对齐.我不知道如何解决这个问题.
- vectorizing either fails outright with an error or it applies the function element-by-element, instead of array-by-array (or row-by-row)
np.apply_along_axis
applies the function iteratively; for example, using the variables defined above, it doesF(a[0], b[0])
and combines this withF(a[0], b[1])
andF(a[0], b[2])
. This is not what I'm looking for. Ideally, I would want it to stop at justF(a[0], b[0])
- index slicing / advanced slicing doesn't do what I would like either. For one, if I do something like
np.dot(a[np.arange(3)], b[np.arange(3)])
this throws a ValueError saying that shapes (3,5) and (3,5) are not aligned. I don't know how to fix this.
我试图以任何可能的方式解决这个问题,但我想出的唯一可行的解决方案是使用列表理解.但是我担心由于使用列表理解而导致的性能成本.如果可能,我想使用 numpy 操作实现相同的效果.我该怎么做?
I tried to solve this in any way I could, but the only solution I've come up with that works is using list comprehension. But I'm worried about the cost to performance as a result of using list comprehension. I would like to achieve the same effect using a numpy operation, if possible. How do I do this?
推荐答案
这种类型的问题在 SO 上已经被打败了,但我会尝试用你的框架来说明问题:
This type of question has been beat to death on SO, but I'll try to illustrate the issues with your framework:
In [1]: a = np.arange(15).reshape([3,5])
...: b = np.arange(30, step=2).reshape([3,5])
...:
In [2]: def f(x,y):
...: return np.dot(x,y)
压缩理解
列表理解方法将 f
应用于 a
和 b
的 3 行.也就是说,它遍历 2 个数组,就像它们是列表一样.每次调用时,您的函数都会获得 2 个一维数组.dot
可以接受其他形状,但目前我们假设它只适用于一对 1ds
zipped comprehension
The list comprehension approach applies f
to the 3 rows of a
and b
. That is, it iterates on the 2 arrays as through they were lists. At each call, your function gets 2 1d arrays. dot
can accept other shapes, but for the moment we'll pretend that it only works with a pair of 1ds
In [3]: np.array([f(x,y) for x,y in zip(a,b)])
Out[3]: array([ 60, 510, 1460])
In [4]: np.dot(a[0],b[0])
Out[4]: 60
矢量化/来自pyfunc
np.vectorize
迭代输入(使用广播 - 这可能很方便),并给出函数标量值.我将用 frompyfunc
返回一个对象 dtype 数组(并由 vectorize
使用)来说明:
vectorize/frompyfunc
np.vectorize
iterates over the inputs (with broadcasting - which can be handy), and gives the function scalar values. I'll illustrate with frompyfunc
returns a object dtype array (and is used by vectorize
):
In [5]: vf = np.frompyfunc(f, 2,1)
In [6]: vf(a,b)
Out[6]:
array([[0, 2, 8, 18, 32],
[50, 72, 98, 128, 162],
[200, 242, 288, 338, 392]], dtype=object)
所以结果是(3,5)数组;顺便跨列求和得到所需的结果
So the result is (3,5) array; incidentally summing across columns gets the desired result
In [9]: vf(a,b).sum(axis=1)
Out[9]: array([60, 510, 1460], dtype=object)
np.vectorize
没有做出任何速度承诺.
np.vectorize
does not make any speed promises.
我不知道您是如何尝试使用 apply_along_axis
的.它只需要一个数组.经过大量设置后,它最终会这样做(对于像 a
这样的二维数组):
I don't know how you tried to use apply_along_axis
. It only takes one array. After a lot of set up it ends up doing (for a 2d array like a
):
for i in range(3):
idx = (i, slice(None))
outarr[idx] = asanyarray(func1d(arr[idx], *args, **kwargs))
对于 3d 和更大的版本,它使在其他"轴上的迭代更简单;对于 2d,它是矫枉过正的.在任何情况下,它都不会加快计算速度.还在迭代中.
For 3d and larger it makes iteration over the 'other' axes simpler; for 2d it is overkill. In any case it does not speed up the calculations. It is still iteration.
(apply_along_axis
需要 arr
和 *args
.它迭代 arr
,但使用 *args
整个.).
(apply_along_axis
takes arr
and *args
. It iterates on arr
, but uses *args
whole.).
np.dot(a[np.arange(3)], b[np.arange(3)])
与
np.dot(a, b)
dot
是矩阵乘积,(3,5) 与 (5,3) 一起产生 (3,3).它将 1d 作为特殊情况处理(参见文档),(3,) 与 (3,) 产生 (3,).
dot
is matrix product, (3,5) works with (5,3) to produce a (3,3). It handles 1d as a special case (see docs), (3,) with (3,) produces (3,).
对于真正通用的 f(x,y)
,你唯一能替代压缩列表理解的方法是一个像这样的索引循环:
For a truly generic f(x,y)
, your only alternative to the zipped list comprehension is an index loop like this:
In [18]: c = np.zeros((a.shape[0]))
In [19]: for i in range(a.shape[0]):
...: c[i] = f(a[i,:], b[i,:])
In [20]: c
Out[20]: array([ 60., 510., 1460.])
速度将相似.(可以使用 cython
将该操作移至编译代码,但我认为您还没有准备好深入研究.)
Speed will be similar. (that action can be moved to compiled code with cython
, but I don't think you are ready to dive in that deep.)
如注释中所述,如果数组为 (N,M)
,并且 N
与 M
相比较小,则此迭代不贵.也就是说,对一个大任务进行几次循环就可以了.如果它们简化大型阵列内存管理,它们甚至可能会更快.
As noted in a comment, if the arrays are (N,M)
, and N
is small compared to M
, this iteration is not costly. That is, a few loops over a big task are ok. They may even be faster if they simplify large array memory management.
理想的解决方案是使用 numpy 编译函数重写泛型函数,使其适用于二维数组.
The ideal solution is to rewrite the generic function so it works with 2d arrays, using numpy compilied functions.
在矩阵乘法的情况下,einsum
在编译代码中实现了乘积总和"的广义形式:
In the matrix multiplication case, einsum
has implemented a generalized form of 'sum-of-products' in compiled code:
In [22]: np.einsum('ij,ij->i',a,b)
Out[22]: array([ 60, 510, 1460])
matmul
也概括了产品,但最适合 3d 数组:
matmul
also generalizes the product, but works best with 3d arrays:
In [25]: a[:,None,:]@b[:,:,None] # needs reshape
Out[25]:
array([[[ 60]],
[[ 510]],
[[1460]]])
这篇关于如何在 numpy 行上应用通用函数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!