如何在 numpy 行上应用通用函数? [英] how to apply a generic function over numpy rows?

查看:55
本文介绍了如何在 numpy 行上应用通用函数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在您将其标记为重复之前,让我向您解释我阅读了此页面 和许多其他页面,但我仍未找到解决问题的方法.

Before you flag this as duplicate, let me explain to you that I read this page and many others and I still haven't found a solution to my problem.

这是我遇到的问题:给定两个二维数组,我想在两个数组上应用函数 F.F 将两个一维数组作为输入.

This is the problem I'm having: given two 2D arrays, I want to apply a function F over the two arrays. F takes as input two 1D arrays.

import numpy as np
a = np.arange(15).reshape([3,5])
b = np.arange(30, step=2).reshape([3,5])

# what is the 'numpy' equivalent of the following?
np.array([np.dot(x,y) for x,y in zip(a,b)])

请注意,np.dot 仅用于演示.这里真正的问题是任何适用于两组一维数组的泛型函数 F.

Please note that np.dot is just for demonstration. The real question here is any generic function F that works over two sets of 1D arrays.

  • 向量化要么完全失败并出现错误,要么逐个元素地应用函数,而不是逐个数组(或逐行)
  • np.apply_along_axis 迭代地应用函数;例如,使用上面定义的变量,它会执行 F(a[0], b[0]) 并将其与 F(a[0], b[1])F(a[0], b[2]).这不是我要找的.理想情况下,我希望它停在 F(a[0], b[0])
  • 索引切片/高级切片也不符合我的意愿.一方面,如果我做类似 np.dot(a[np.arange(3)], b[np.arange(3)]) 的事情,这会抛出一个 ValueError 说形状 (3,5) 和 (3,5) 没有对齐.我不知道如何解决这个问题.
  • vectorizing either fails outright with an error or it applies the function element-by-element, instead of array-by-array (or row-by-row)
  • np.apply_along_axis applies the function iteratively; for example, using the variables defined above, it does F(a[0], b[0]) and combines this with F(a[0], b[1]) and F(a[0], b[2]). This is not what I'm looking for. Ideally, I would want it to stop at just F(a[0], b[0])
  • index slicing / advanced slicing doesn't do what I would like either. For one, if I do something like np.dot(a[np.arange(3)], b[np.arange(3)]) this throws a ValueError saying that shapes (3,5) and (3,5) are not aligned. I don't know how to fix this.

我试图以任何可能的方式解决这个问题,但我想出的唯一可行的解​​决方案是使用列表理解.但是我担心由于使用列表理解而导致的性能成本.如果可能,我想使用 numpy 操作实现相同的效果.我该怎么做?

I tried to solve this in any way I could, but the only solution I've come up with that works is using list comprehension. But I'm worried about the cost to performance as a result of using list comprehension. I would like to achieve the same effect using a numpy operation, if possible. How do I do this?

推荐答案

这种类型的问题在 SO 上已经被打败了,但我会尝试用你的框架来说明问题:

This type of question has been beat to death on SO, but I'll try to illustrate the issues with your framework:

In [1]: a = np.arange(15).reshape([3,5])
   ...: b = np.arange(30, step=2).reshape([3,5])
   ...: 
In [2]: def f(x,y):
   ...:     return np.dot(x,y)

压缩理解

列表理解方法将 f 应用于 ab 的 3 行.也就是说,它遍历 2 个数组,就像它们是列表一样.每次调用时,您的函数都会获得 2 个一维数组.dot 可以接受其他形状,但目前我们假设它只适用于一对 1ds

zipped comprehension

The list comprehension approach applies f to the 3 rows of a and b. That is, it iterates on the 2 arrays as through they were lists. At each call, your function gets 2 1d arrays. dot can accept other shapes, but for the moment we'll pretend that it only works with a pair of 1ds

In [3]: np.array([f(x,y) for x,y in zip(a,b)])
Out[3]: array([  60,  510, 1460])
In [4]: np.dot(a[0],b[0])
Out[4]: 60

矢量化/来自pyfunc

np.vectorize 迭代输入(使用广播 - 这可能很方便),并给出函数标量值.我将用 frompyfunc 返回一个对象 dtype 数组(并由 vectorize 使用)来说明:

vectorize/frompyfunc

np.vectorize iterates over the inputs (with broadcasting - which can be handy), and gives the function scalar values. I'll illustrate with frompyfunc returns a object dtype array (and is used by vectorize):

In [5]: vf = np.frompyfunc(f, 2,1)
In [6]: vf(a,b)
Out[6]: 
array([[0, 2, 8, 18, 32],
       [50, 72, 98, 128, 162],
       [200, 242, 288, 338, 392]], dtype=object)

所以结果是(3,5)数组;顺便跨列求和得到所需的结果

So the result is (3,5) array; incidentally summing across columns gets the desired result

In [9]: vf(a,b).sum(axis=1)
Out[9]: array([60, 510, 1460], dtype=object)

np.vectorize 没有做出任何速度承诺.

np.vectorize does not make any speed promises.

我不知道您是如何尝试使用 apply_along_axis 的.它只需要一个数组.经过大量设置后,它最终会这样做(对于像 a 这样的二维数组):

I don't know how you tried to use apply_along_axis. It only takes one array. After a lot of set up it ends up doing (for a 2d array like a):

for i in range(3):
    idx = (i, slice(None))
    outarr[idx] = asanyarray(func1d(arr[idx], *args, **kwargs))

对于 3d 和更大的版本,它使在其他"轴上的迭代更简单;对于 2d,它是矫枉过正的.在任何情况下,它都不会加快计算速度.还在迭代中.

For 3d and larger it makes iteration over the 'other' axes simpler; for 2d it is overkill. In any case it does not speed up the calculations. It is still iteration.

(apply_along_axis 需要 arr*args.它迭代 arr,但使用 *args 整个.).

(apply_along_axis takes arr and *args. It iterates on arr, but uses *args whole.).

np.dot(a[np.arange(3)], b[np.arange(3)])

np.dot(a, b)

dot 是矩阵乘积,(3,5) 与 (5,3) 一起产生 (3,3).它将 1d 作为特殊情况处理(参见文档),(3,) 与 (3,) 产生 (3,).

dot is matrix product, (3,5) works with (5,3) to produce a (3,3). It handles 1d as a special case (see docs), (3,) with (3,) produces (3,).

对于真正通用的 f(x,y),你唯一能替代压缩列表理解的方法是一个像这样的索引循环:

For a truly generic f(x,y), your only alternative to the zipped list comprehension is an index loop like this:

In [18]: c = np.zeros((a.shape[0]))
In [19]: for i in range(a.shape[0]):
    ...:    c[i] = f(a[i,:], b[i,:])
In [20]: c
Out[20]: array([   60.,   510.,  1460.])

速度将相似.(可以使用 cython 将该操作移至编译代码,但我认为您还没有准备好深入研究.)

Speed will be similar. (that action can be moved to compiled code with cython, but I don't think you are ready to dive in that deep.)

如注释中所述,如果数组为 (N,M),并且 NM 相比较小,则此迭代不贵.也就是说,对一个大任务进行几次循环就可以了.如果它们简化大型阵列内存管理,它们甚至可能会更快.

As noted in a comment, if the arrays are (N,M), and N is small compared to M, this iteration is not costly. That is, a few loops over a big task are ok. They may even be faster if they simplify large array memory management.

理想的解决方案是使用 numpy 编译函数重写泛型函数,使其适用于二维数组.

The ideal solution is to rewrite the generic function so it works with 2d arrays, using numpy compilied functions.

在矩阵乘法的情况下,einsum 在编译代码中实现了乘积总和"的广义形式:

In the matrix multiplication case, einsum has implemented a generalized form of 'sum-of-products' in compiled code:

In [22]: np.einsum('ij,ij->i',a,b)
Out[22]: array([  60,  510, 1460])

matmul 也概括了产品,但最适合 3d 数组:

matmul also generalizes the product, but works best with 3d arrays:

In [25]: a[:,None,:]@b[:,:,None]    # needs reshape
Out[25]: 
array([[[  60]],

       [[ 510]],

       [[1460]]])

这篇关于如何在 numpy 行上应用通用函数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆