将 python 函数广播到 numpy 数组 [英] Broadcasting a python function on to numpy arrays
问题描述
假设我们有一个特别简单的函数,比如
Let's say we have a particularly simple function like
import scipy as sp
def func(x, y):
return x + y
这个函数显然适用于 x
和 y
的几种内置 Python 数据类型,如字符串、列表、整数、浮点数、数组等.因为我们对数组特别感兴趣,我们考虑两个数组:
This function evidently works for several builtin python datatypes of x
and y
like string, list, int, float, array, etc. Since we are particularly interested in arrays, we consider two arrays:
x = sp.array([-2, -1, 0, 1, 2])
y = sp.array([-2, -1, 0, 1, 2])
xx = x[:, sp.newaxis]
yy = y[sp.newaxis, :]
>>> func(xx, yy)
返回
array([[-4, -3, -2, -1, 0],
[-3, -2, -1, 0, 1],
[-2, -1, 0, 1, 2],
[-1, 0, 1, 2, 3],
[ 0, 1, 2, 3, 4]])
正如我们所期望的那样.
just as we would expect.
现在如果想要将数组作为以下函数的输入怎么办?
Now what if one wants to throw in arrays as the inputs for the following function?
def func2(x, y):
if x > y:
return x + y
else:
return x - y
执行 >>>func(xx, yy)
会引发错误.
人们会想到的第一个明显方法是 scipy/numpy 中的 sp.vectorize
函数.然而,这种方法已被证明不是很有效.任何人都可以想出一种更健壮的方法来将任何函数一般广播到 numpy 数组吗?
The first obvious method that one would come up with is the sp.vectorize
function in scipy/numpy. This method, nevertheless has been proved to be not very efficient. Can anyone think of a more robust way of broadcasting any function in general on to numpy arrays?
如果以数组友好的方式重写代码是唯一的方法,如果您也能在这里提及它会有所帮助.
If re-writing the code in an array-friendly fashion is the only way, it would help if you could mention it here too.
推荐答案
np.vectorize
是将 Python 对数字进行运算的函数转换为对 ndarray 进行运算的 numpy 函数的通用方法.
np.vectorize
is a general way to convert Python functions that operate on numbers into numpy functions that operate on ndarrays.
但是,正如您所指出的,它并不是很快,因为它在后台"使用了 Python 循环.
However, as you point out, it isn't very fast, since it is using a Python loop "under the hood".
为了获得更好的速度,您必须手工制作一个函数,该函数将 numpy 数组作为输入并利用该 numpy 特性:
To achieve better speed, you have to hand-craft a function that expects numpy arrays as input and takes advantage of that numpy-ness:
import numpy as np
def func2(x, y):
return np.where(x>y,x+y,x-y)
x = np.array([-2, -1, 0, 1, 2])
y = np.array([-2, -1, 0, 1, 2])
xx = x[:, np.newaxis]
yy = y[np.newaxis, :]
print(func2(xx, yy))
# [[ 0 -1 -2 -3 -4]
# [-3 0 -1 -2 -3]
# [-2 -1 0 -1 -2]
# [-1 0 1 0 -1]
# [ 0 1 2 3 0]]
<小时>
关于性能:
Regarding performance:
test.py:
import numpy as np
def func2a(x, y):
return np.where(x>y,x+y,x-y)
def func2b(x, y):
ind=x>y
z=np.empty(ind.shape,dtype=x.dtype)
z[ind]=(x+y)[ind]
z[~ind]=(x-y)[~ind]
return z
def func2c(x, y):
# x, y= x[:, None], y[None, :]
A, L= x+ y, x<= y
A[L]= (x- y)[L]
return A
N=40
x = np.random.random(N)
y = np.random.random(N)
xx = x[:, np.newaxis]
yy = y[np.newaxis, :]
运行:
当 N=30 时:
% python -mtimeit -s'import test' 'test.func2a(test.xx,test.yy)'
1000 loops, best of 3: 219 usec per loop
% python -mtimeit -s'import test' 'test.func2b(test.xx,test.yy)'
1000 loops, best of 3: 488 usec per loop
% python -mtimeit -s'import test' 'test.func2c(test.xx,test.yy)'
1000 loops, best of 3: 248 usec per loop
当 N=1000 时:
With N=1000:
% python -mtimeit -s'import test' 'test.func2a(test.xx,test.yy)'
10 loops, best of 3: 93.7 msec per loop
% python -mtimeit -s'import test' 'test.func2b(test.xx,test.yy)'
10 loops, best of 3: 367 msec per loop
% python -mtimeit -s'import test' 'test.func2c(test.xx,test.yy)'
10 loops, best of 3: 186 msec per loop
这似乎表明 func2a
比 func2c
稍快(而 func2b
非常慢).
This seems to suggest that func2a
is slightly faster than func2c
(and func2b
is horribly slow).
这篇关于将 python 函数广播到 numpy 数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!