使用numpy数组的条件向量化计算,而无需使用直接掩码 [英] conditional vectorized calculation with numpy arrays without using direct masking

查看:92
本文介绍了使用numpy数组的条件向量化计算,而无需使用直接掩码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

关注另一个问题

import numpy as np

repeat=int(1e5)
r_base = np.linspace(0,4,5)
a_base = 2
np.random.seed(0)
r_mat = r_base * np.random.uniform(0.9,1.1,(repeat,5))

a_array = a_base * np.random.uniform(0.9,1.1, repeat)


# original slow approach
def func_vetorized_level1(r_row, a):
    if r_row.mean()>2:
        result = np.where((r_row >= a), r_row - a, np.nan)
    else:
        result = np.where((r_row >= a), r_row + a, 0)
    return result
# try to broadcast this func to every row of r_mat using list comprehension
def func_list_level2(r_mat, a_array):
    res_mat = np.array([func_vetorized_level1(this_r_row, this_a) 
                        for this_r_row, this_a in zip(r_mat, a_array)])
    return res_mat

# faster with direct masking, but with unnecessary more calculation
def f_faster(r_mat,a_array):
    a = a_array[:, None]  # to column vector

    row_mask = (r_mat.mean(axis=1) > 2)[:,None]
    elem_mask = r_mat >= a

    out = np.empty_like(r_mat)

    out[row_mask & elem_mask] = (r_mat - a)[row_mask & elem_mask]
    out[~row_mask & elem_mask] = (r_mat + a)[~row_mask & elem_mask]
    out[row_mask & ~elem_mask] = np.nan
    out[~row_mask & ~elem_mask] = 0
    
    return out

# fastest with ufunc in numpy as suggested by @mad_physicist
def f_fastest(r_mat,a_array):
    a = a_array[:, None]  # to column vector

    row_mask = (r_mat.mean(axis=1) > 2)[:,None]
    elem_mask = r_mat >= a

    out = np.empty_like(r_mat)


    np.subtract(r_mat, a, out=out, where=row_mask & elem_mask)
    np.add(r_mat, a, out=out, where=~row_mask & elem_mask)
    out[row_mask & ~elem_mask] = np.nan
    out[~row_mask & ~elem_mask] = 0
    
    return out

我想问一下是否有可能是否具有可以使用的用户定义的功能,或者利用最快的方法?我考虑过使用索引,但是发现它具有挑战性,因为使用 [row_ind,co_ind] 的切片元素是所选元素的一维数组。我看到可以使用 reshape 将切片的矩阵放入矩阵中,但是有一种优雅的方法吗?理想情况下,此 r_mat + a 操作可以由用户定义的函数替换。

I would like to ask if it is possible to have a user-defined func that can be used, or take advantage of the fastest approach? I thought about using indexing but found it is challenging, because the sliced elements using [row_ind, co_ind] is a 1d array of the selected elements. I see the sliced matrix can be put to a matrix using reshape, but is there an elegant way to do it? Ideally this r_mat + a operation can be replace by a user-defined function.

推荐答案

您绝对可以使用带有用户定义函数的矢量化解决方案,只要该函数被矢量化以在1D数组上逐个元素地工作(开箱即用numpy函数编写的任何内容都应如此) )。

You absolutely can have a vectorized solution with a user defined function, as long as that function it is vectorized to work element-wise on a 1D array (which should be the case for anything written using numpy functions out of the box).

假设您将 r_mat 作为(m,n)矩阵和 a_array 作为(m,)向量。您可以编写函数来接受钩子。每个钩子可以是常量或可调用的。如果它是可调用的,则使用两个相同长度的数组调用它,并且必须返回相同长度的第三个数组。您可以更改该合同以包括索引或任何您想要的内容:

Let's say you have r_mat as an (m, n) matrix and a_array as an (m,) vector. You can write your function to accept hooks. Each hook can be a constant or a callable. If it is a callable, it gets called with two arrays of the same length, and must return a third array of the same length. You can change that contract to include indices or whatever you want at will:

def f(r_mat, a_array, hook11, hook01, hook10, hook00):
    a = a_array[:, None]  # to column vector

    row_mask = (r_mat.mean(axis=1) > 2)[:,None]
    elem_mask = r_mat >= a

    out = np.empty_like(r_mat)

    def apply_hook(mask, hook):
        r, c = np.nonzero(mask)
        out[r, c] = hook(r_mat[r, c], a_array[r]) if callable(hook) else hook

    apply_hook(row_mask & elem_mask, hook11)
    apply_hook(~row_mask & elem_mask, hook01)
    apply_hook(row_mask & ~elem_mask, hook10)
    apply_hook(~row_mask & ~elem_mask, hook00)

    return out

您代码中的当前配置将被称为

The current configuration in your code would be called like

f(r_mat, a_array, np.subtract, np.add, np.nan, 0)

假设您想做的事情比 np.subtract 。例如,您可以这样做:

Let's say you wanted to do something more complex than np.subtract. You could do for example:

def my_complicated_func(r, a):
    return np.cumsum(r, a) - 3 * r // a + np.exp(a)

f(r_mat, a_array, my_complicated_func, np.add, np.nan, 0.0)

关键是 my_complicated_func 在数组上运行。它将传递 r_mat 的元素的子集,以及 a_array 的元素的子集,并根据需要重复多次

The key is that my_complicated_func operates on arrays. It will be passed a subset of the elements of r_mat and the elements of a_array duplicated as many times as necessary along each row.

您也可以通过了解每个位置的索引的功能来执行相同的操作。只需将 hook 称为 hook(r_mat [r,c],a_array [r],r,c)。现在,钩子函数必须接受两个附加参数。原始代码将等同于

You could also do the same thing with the function being aware of the index of each location. Just call hook as hook(r_mat[r, c], a_array[r], r, c). Now the hook functions must accept two additional arguments. The original code would be equivalent to

f(r_mat, a_array, lambda r, a, *args: np.subtract(r, a), lambda r, a, *args: np.add(r, a), np.nan, 0)

这篇关于使用numpy数组的条件向量化计算,而无需使用直接掩码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆