如何在numpy行上应用泛型函数? [英] how to apply a generic function over numpy rows?
问题描述
Before you flag this as duplicate, let me explain to you that I read this page and many others and I still haven't found a solution to my problem.
这是我遇到的问题:给定两个2D数组,我想在两个数组上应用函数F. F将两个一维数组作为输入.
This is the problem I'm having: given two 2D arrays, I want to apply a function F over the two arrays. F takes as input two 1D arrays.
import numpy as np
a = np.arange(15).reshape([3,5])
b = np.arange(30, step=2).reshape([3,5])
# what is the 'numpy' equivalent of the following?
np.array([np.dot(x,y) for x,y in zip(a,b)])
请注意,np.dot
仅用于演示.真正的问题是,可以在两组一维数组上工作的泛型函数F.
Please note that np.dot
is just for demonstration. The real question here is any generic function F that works over two sets of 1D arrays.
- 向量化要么完全失败,要么出错,或者应用逐个元素的功能,而不是逐个数组(或逐行)
-
np.apply_along_axis
迭代地应用该函数;例如,使用上面定义的变量,它执行F(a[0], b[0])
并将其与F(a[0], b[1])
和F(a[0], b[2])
组合.这不是我要找的东西.理想情况下,我希望它仅在F(a[0], b[0])
处停止
- 索引切片/高级切片也不符合我的要求.例如,如果我做类似
np.dot(a[np.arange(3)], b[np.arange(3)])
的操作,则会抛出ValueError,表示形状(3,5)和(3,5)没有对齐.我不知道该如何解决.
- vectorizing either fails outright with an error or it applies the function element-by-element, instead of array-by-array (or row-by-row)
np.apply_along_axis
applies the function iteratively; for example, using the variables defined above, it doesF(a[0], b[0])
and combines this withF(a[0], b[1])
andF(a[0], b[2])
. This is not what I'm looking for. Ideally, I would want it to stop at justF(a[0], b[0])
- index slicing / advanced slicing doesn't do what I would like either. For one, if I do something like
np.dot(a[np.arange(3)], b[np.arange(3)])
this throws a ValueError saying that shapes (3,5) and (3,5) are not aligned. I don't know how to fix this.
我试图以任何可能的方式解决此问题,但是我想出的唯一解决方案是使用列表理解.但是我担心使用列表理解会导致性能损失.如果可能,我想使用numpy操作达到相同的效果.我该怎么做?
I tried to solve this in any way I could, but the only solution I've come up with that works is using list comprehension. But I'm worried about the cost to performance as a result of using list comprehension. I would like to achieve the same effect using a numpy operation, if possible. How do I do this?
推荐答案
此类问题在SO上已被淘汰,但我将尝试说明您的框架存在的问题:
This type of question has been beat to death on SO, but I'll try to illustrate the issues with your framework:
In [1]: a = np.arange(15).reshape([3,5])
...: b = np.arange(30, step=2).reshape([3,5])
...:
In [2]: def f(x,y):
...: return np.dot(x,y)
压缩理解力
列表理解方法将f
应用于a
和b
的3行.也就是说,像遍历列表一样在2个数组上进行迭代.每次调用时,您的函数将获得2个1d数组. dot
可以接受其他形状,但目前我们假设它仅适用于一对1ds
zipped comprehension
The list comprehension approach applies f
to the 3 rows of a
and b
. That is, it iterates on the 2 arrays as through they were lists. At each call, your function gets 2 1d arrays. dot
can accept other shapes, but for the moment we'll pretend that it only works with a pair of 1ds
In [3]: np.array([f(x,y) for x,y in zip(a,b)])
Out[3]: array([ 60, 510, 1460])
In [4]: np.dot(a[0],b[0])
Out[4]: 60
vectorize/frompyfunc
np.vectorize
迭代输入(使用广播-可能很方便),并给出函数标量值.我将用frompyfunc
说明返回一个对象dtype数组(并由vectorize
使用):
vectorize/frompyfunc
np.vectorize
iterates over the inputs (with broadcasting - which can be handy), and gives the function scalar values. I'll illustrate with frompyfunc
returns a object dtype array (and is used by vectorize
):
In [5]: vf = np.frompyfunc(f, 2,1)
In [6]: vf(a,b)
Out[6]:
array([[0, 2, 8, 18, 32],
[50, 72, 98, 128, 162],
[200, 242, 288, 338, 392]], dtype=object)
所以结果是(3,5)数组;偶然地跨列求和就得到了预期的结果
So the result is (3,5) array; incidentally summing across columns gets the desired result
In [9]: vf(a,b).sum(axis=1)
Out[9]: array([60, 510, 1460], dtype=object)
np.vectorize
没有做出任何速度承诺.
np.vectorize
does not make any speed promises.
我不知道您如何尝试使用apply_along_axis
.它只需要一个数组.经过大量设置后,最终完成了操作(对于像a
这样的2d数组):
I don't know how you tried to use apply_along_axis
. It only takes one array. After a lot of set up it ends up doing (for a 2d array like a
):
for i in range(3):
idx = (i, slice(None))
outarr[idx] = asanyarray(func1d(arr[idx], *args, **kwargs))
对于3d及更大尺寸,它使在其他"轴上的迭代更加简单;对于2d来说,这是过分的杀伤力.无论如何,它不会加快计算速度.它仍然是迭代.
For 3d and larger it makes iteration over the 'other' axes simpler; for 2d it is overkill. In any case it does not speed up the calculations. It is still iteration.
(apply_along_axis
使用arr
和*args
.在arr
上进行迭代,但整体使用*args
.)
(apply_along_axis
takes arr
and *args
. It iterates on arr
, but uses *args
whole.).
np.dot(a[np.arange(3)], b[np.arange(3)])
与
np.dot(a, b)
dot
是矩阵乘积,(3,5)与(5,3)一起产生(3,3).它将1d作为特殊情况处理(请参阅文档),(3)与(3,)产生(3,).
dot
is matrix product, (3,5) works with (5,3) to produce a (3,3). It handles 1d as a special case (see docs), (3,) with (3,) produces (3,).
对于真正通用的f(x,y)
,您唯一的压缩列表理解的替代方法是像这样的索引循环:
For a truly generic f(x,y)
, your only alternative to the zipped list comprehension is an index loop like this:
In [18]: c = np.zeros((a.shape[0]))
In [19]: for i in range(a.shape[0]):
...: c[i] = f(a[i,:], b[i,:])
In [20]: c
Out[20]: array([ 60., 510., 1460.])
速度将相似. (该动作可以通过cython
移至已编译的代码,但我认为您不准备深入其中.)
Speed will be similar. (that action can be moved to compiled code with cython
, but I don't think you are ready to dive in that deep.)
如注释中所述,如果数组为(N,M)
,并且N
与M
相比较小,则此迭代的成本并不高.也就是说,完成一个大任务的几个循环是可以的.如果简化大型阵列内存管理,它们甚至可能更快.
As noted in a comment, if the arrays are (N,M)
, and N
is small compared to M
, this iteration is not costly. That is, a few loops over a big task are ok. They may even be faster if they simplify large array memory management.
理想的解决方案是使用numpy编译函数重写通用函数,使其可用于2d数组.
The ideal solution is to rewrite the generic function so it works with 2d arrays, using numpy compilied functions.
在矩阵乘法的情况下,einsum
在编译后的代码中实现了乘积和"的广义形式:
In the matrix multiplication case, einsum
has implemented a generalized form of 'sum-of-products' in compiled code:
In [22]: np.einsum('ij,ij->i',a,b)
Out[22]: array([ 60, 510, 1460])
matmul
也可以推广该产品,但最适合3d阵列:
matmul
also generalizes the product, but works best with 3d arrays:
In [25]: a[:,None,:]@b[:,:,None] # needs reshape
Out[25]:
array([[[ 60]],
[[ 510]],
[[1460]]])
这篇关于如何在numpy行上应用泛型函数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!