使用numpy.frompyfunc通过参数将广播添加到python函数 [英] Use numpy.frompyfunc to add broadcasting to a python function with argument
问题描述
从类似db
的数组(大约为(1e6, 300)
)和mask = [1, 0, 1]
向量中,我在第一列中将目标定义为1.
From an array like db
(which will be approximately (1e6, 300)
) and a mask = [1, 0, 1]
vector, I define the target as a 1 in the first column.
我想创建一个out
向量,该向量由db
中的相应行与mask
和target==1
匹配的地方组成,并且在其他所有位置为零.
I want to create an out
vector that consists of ones where the corresponding row in db
matches the mask
and target==1
, and zeros everywhere else.
db = np.array([ # out for mask = [1, 0, 1]
# target, vector #
[1, 1, 0, 1], # 1
[0, 1, 1, 1], # 0 (fit to mask but target == 0)
[0, 0, 1, 0], # 0
[1, 1, 0, 1], # 1
[0, 1, 1, 0], # 0
[1, 0, 0, 0], # 0
])
我定义了一个vline
函数,该函数使用np.array_equal(mask, mask & vector)
将mask
应用于每个数组行,以检查向量101和111是否适合掩码,然后仅保留target == 1
处的索引.
I have defined a vline
function that applies a mask
to each array line using np.array_equal(mask, mask & vector)
to check that vectors 101 and 111 fit the mask, then retains only the indices where target == 1
.
out
初始化为array([0, 0, 0, 0, 0, 0])
out = [0, 0, 0, 0, 0, 0]
vline
函数定义为:
def vline(idx, mask):
line = db[idx]
target, vector = line[0], line[1:]
if np.array_equal(mask, mask & vector):
if target == 1:
out[idx] = 1
通过在for
循环中逐行应用此功能,可以得到正确的结果:
I get the correct result by applying this function line-by-line in a for
loop:
def check_mask(db, out, mask=[1, 0, 1]):
# idx_db to iterate over db lines without enumerate
for idx in np.arange(db.shape[0]):
vline(idx, mask=mask)
return out
assert check_mask(db, out, [1, 0, 1]) == [1, 0, 0, 1, 0, 0] # it works !
现在,我想通过创建ufunc
来矢量化vline
:
Now I want to vectorize vline
by creating a ufunc
:
ufunc_vline = np.frompyfunc(vline, 2, 1)
out = [0, 0, 0, 0, 0, 0]
ufunc_vline(db, [1, 0, 1])
print out
但是ufunc
抱怨广播具有这些形状的输入:
But the ufunc
complains about broadcasting inputs with those shapes:
In [217]: ufunc_vline(db, [1, 0, 1])
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-217-9008ebeb6aa1> in <module>()
----> 1 ufunc_vline(db, [1, 0, 1])
ValueError: operands could not be broadcast together with shapes (6,4) (3,)
In [218]:
推荐答案
将vline
转换为可广播为相同形状.您正在将形状不兼容的两个数组传递给ufunc_vline
函数(db.shape == (6, 4)
和mask.shape == (3,)
),因此您看到的是ValueError
.
Converting vline
to a numpy ufunc fundamentally doesn't make sense, since ufuncs are always applied to numpy arrays in an elementwise fashion. Because of this, the input arguments must either have the same shape, or must be broadcastable to the same shape. You are passing two arrays with incompatible shapes to your ufunc_vline
function (db.shape == (6, 4)
and mask.shape == (3,)
), hence the ValueError
you are seeing.
ufunc_vline
还有其他几个问题:
-
np.frompyfunc(vline, 2, 1)
指定vline
应该返回单个输出参数,而vline
实际上不返回任何内容(但在适当位置修改out
).
np.frompyfunc(vline, 2, 1)
specifies thatvline
should return a single output argument, whereasvline
actually returns nothing (but modifiesout
in place).
您正在将db
作为第一个参数传递给ufunc_vline
,而vline
希望第一个参数是idx
,它用作对db
行的索引. /p>
You are passing db
as the first argument to ufunc_vline
, whereas vline
expects the first argument to be idx
, which is used as an index into the rows of db
.
此外,请记住,与标准Python for
循环相比,使用np.frompyfunc
从Python函数创建ufunc不会产生任何明显的性能优势.要查看任何重大改进,您可能需要使用低级语言(例如C)编写ufunc(请参见
Also, bear in mind that creating a ufunc from a Python function using np.frompyfunc
will not yield any noticeable performance benefit over a standard Python for
loop. To see any serious improvement you would probably need to code the ufunc in a low-level language such as C (see this example in the documentation).
话虽如此,您的vline
函数可以使用标准布尔数组操作轻松地向量化:
Having said that, your vline
function can be easily vectorized using standard boolean array operations:
def vline_vectorized(db, mask):
return db[:, 0] & np.all((mask & db[:, 1:]) == mask, axis=1)
例如:
db = np.array([ # out for mask = [1, 0, 1]
# target, vector #
[1, 1, 0, 1], # 1
[0, 1, 1, 1], # 0 (fit to mask but target == 0)
[0, 0, 1, 0], # 0
[1, 1, 0, 1], # 1
[0, 1, 1, 0], # 0
[1, 0, 0, 0], # 0
])
mask = np.array([1, 0, 1])
print(repr(vline_vectorized(db, mask)))
# array([1, 0, 0, 1, 0, 0])
这篇关于使用numpy.frompyfunc通过参数将广播添加到python函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!