使用numpy.frompyfunc通过参数将广播添加到python函数 [英] Use numpy.frompyfunc to add broadcasting to a python function with argument

查看:189
本文介绍了使用numpy.frompyfunc通过参数将广播添加到python函数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

从类似db的数组(大约为(1e6, 300))和mask = [1, 0, 1]向量中,我在第一列中将目标定义为1.

From an array like db (which will be approximately (1e6, 300)) and a mask = [1, 0, 1] vector, I define the target as a 1 in the first column.

我想创建一个out向量,该向量由db中的相应行与masktarget==1匹配的地方组成,并且在其他所有位置为零.

I want to create an out vector that consists of ones where the corresponding row in db matches the mask and target==1, and zeros everywhere else.

db = np.array([       # out for mask = [1, 0, 1]
# target,  vector     #
  [1,      1, 0, 1],  # 1
  [0,      1, 1, 1],  # 0 (fit to mask but target == 0)
  [0,      0, 1, 0],  # 0
  [1,      1, 0, 1],  # 1
  [0,      1, 1, 0],  # 0
  [1,      0, 0, 0],  # 0
  ])

我定义了一个vline函数,该函数使用np.array_equal(mask, mask & vector)mask应用于每个数组行,以检查向量101和111是否适合掩码,然后仅保留target == 1处的索引.

I have defined a vline function that applies a mask to each array line using np.array_equal(mask, mask & vector) to check that vectors 101 and 111 fit the mask, then retains only the indices where target == 1.

out初始化为array([0, 0, 0, 0, 0, 0])

out = [0, 0, 0, 0, 0, 0]

vline函数定义为:

def vline(idx, mask):
    line = db[idx]
    target, vector = line[0], line[1:]
    if np.array_equal(mask, mask & vector):
        if target == 1:
            out[idx] = 1

通过在for循环中逐行应用此功能,可以得到正确的结果:

I get the correct result by applying this function line-by-line in a for loop:

def check_mask(db, out, mask=[1, 0, 1]):
    # idx_db to iterate over db lines without enumerate
    for idx in np.arange(db.shape[0]):
        vline(idx, mask=mask)
    return out

assert check_mask(db, out, [1, 0, 1]) == [1, 0, 0, 1, 0, 0] # it works !

现在,我想通过创建ufunc来矢量化vline:

Now I want to vectorize vline by creating a ufunc:

ufunc_vline = np.frompyfunc(vline, 2, 1)
out = [0, 0, 0, 0, 0, 0]
ufunc_vline(db, [1, 0, 1])
print out

但是ufunc抱怨广播具有这些形状的输入:

But the ufunc complains about broadcasting inputs with those shapes:

In [217]:     ufunc_vline(db, [1, 0, 1])
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-217-9008ebeb6aa1> in <module>()
----> 1 ufunc_vline(db, [1, 0, 1])
ValueError: operands could not be broadcast together with shapes (6,4) (3,)
In [218]:

推荐答案

vline转换为可广播为相同形状.您正在将形状不兼容的两个数组传递给ufunc_vline函数(db.shape == (6, 4)mask.shape == (3,)),因此您看到的是ValueError.

Converting vline to a numpy ufunc fundamentally doesn't make sense, since ufuncs are always applied to numpy arrays in an elementwise fashion. Because of this, the input arguments must either have the same shape, or must be broadcastable to the same shape. You are passing two arrays with incompatible shapes to your ufunc_vline function (db.shape == (6, 4) and mask.shape == (3,)), hence the ValueError you are seeing.

ufunc_vline还有其他几个问题:

  • np.frompyfunc(vline, 2, 1)指定vline应该返回单个输出参数,而vline实际上不返回任何内容(但在适当位置修改out).

  • np.frompyfunc(vline, 2, 1) specifies that vline should return a single output argument, whereas vline actually returns nothing (but modifies out in place).

您正在将db作为第一个参数传递给ufunc_vline,而vline希望第一个参数是idx,它用作对db行的索引. /p>

You are passing db as the first argument to ufunc_vline, whereas vline expects the first argument to be idx, which is used as an index into the rows of db.

此外,请记住,与标准Python for循环相比,使用np.frompyfunc从Python函数创建ufunc不会产生任何明显的性能优势.要查看任何重大改进,您可能需要使用低级语言(例如C)编写ufunc(请参见

Also, bear in mind that creating a ufunc from a Python function using np.frompyfunc will not yield any noticeable performance benefit over a standard Python for loop. To see any serious improvement you would probably need to code the ufunc in a low-level language such as C (see this example in the documentation).

话虽如此,您的vline函数可以使用标准布尔数组操作轻松地向量化:

Having said that, your vline function can be easily vectorized using standard boolean array operations:

def vline_vectorized(db, mask): 
    return db[:, 0] & np.all((mask & db[:, 1:]) == mask, axis=1)

例如:

db = np.array([       # out for mask = [1, 0, 1]
# target,  vector     #
  [1,      1, 0, 1],  # 1
  [0,      1, 1, 1],  # 0 (fit to mask but target == 0)
  [0,      0, 1, 0],  # 0
  [1,      1, 0, 1],  # 1
  [0,      1, 1, 0],  # 0
  [1,      0, 0, 0],  # 0
  ])

mask = np.array([1, 0, 1])

print(repr(vline_vectorized(db, mask)))
# array([1, 0, 0, 1, 0, 0])

这篇关于使用numpy.frompyfunc通过参数将广播添加到python函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆