每行的 Bin 元素 - NumPy 的矢量化 2D Bincount [英] Bin elements per row - Vectorized 2D Bincount for NumPy

查看：27 发布时间：2021/12/8 9:19:48 python performance numpy matrix vectorization

本文介绍了每行的 Bin 元素 - NumPy 的矢量化 2D Bincount的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个带有整数值的 NumPy 数组.矩阵的值范围从 0 到矩阵中的最大元素(换句话说，从 0 到最大数据元素的所有数字都在其中).我需要构建有效(有效意味着快速完全矢量化的解决方案)来搜索每行中的元素数量并根据矩阵值对它们进行编码.

I have a NumPy array with integer values. Values of matrix range from 0 to max element in matrix(in other words, all numbers from 0 to max data element presented in it). I need to build effective( effective means fast fully-vectorized solution) for searching number of elements in each row and encode them according to matrix values.

我找不到类似的问题，或以某种方式帮助解决此问题的问题.

I could not find a similar question, or a question that somehow helped to solve this.

所以如果我在输入中有这个 data:

So if i have this data in input:

# shape is (N0=4, m0=4) 
1   1   0   4
2   4   2   1
1   2   3   5
4   4   4   1

所需的输出是:

# shape(N=N0, m=data.max()+1):
1   2   0   0   1   0
0   1   2   0   1   0
0   1   1   1   0   1
0   1   0   0   3   0

我知道如何通过简单地计算每一行 data 中的唯一值来解决这个问题，然后将 data 中的所有可能值考虑在内，然后组合结果大批.

I know how to solve this by simply counting unique values in each row of data iterating one by one, and then combining results taking in account all possible values in data array.

虽然使用 NumPy 进行矢量化，但关键问题是逐个搜索每个数字很慢，并且假设存在很多唯一数字，这不是有效的解决方案.一般来说，N 和唯一编号计数都比较大(顺便说一下，N 似乎比唯一编号计数大).

While using NumPy for vectorizing this the key problem is that searching each number one by one is slow and assuming that there are a lot of unique numbers presented, this can not be effective solution. Generally both N and unique numbers count is rather large(by the way, N seem to be larger than unique numbers count).

有人有好主意吗?)

推荐答案

好吧，这基本上就是 np.bincount 处理 1D 数组.但是，我们需要在每一行上迭代地使用它(简单地想一下).为了使其矢量化，我们可以将每一行偏移该最大数字.这个想法是为每一行设置不同的 bin，这样它们就不会受到具有相同编号的其他行元素的影响.

Well that's basically what does np.bincount does with 1D arrays. But, we need to use it on each row iteratively (thinking of it simply). To make it vectorized, we could offset each row by that max number. The idea is to have different bins for each row such that they are not affected by other row elements with same numbers.

因此，实现将是 -

# Vectorized solution
def bincount2D_vectorized(a):    
    N = a.max()+1
    a_offs = a + np.arange(a.shape[0])[:,None]*N
    return np.bincount(a_offs.ravel(), minlength=a.shape[0]*N).reshape(-1,N)

样品运行 -

In [189]: a
Out[189]: 
array([[1, 1, 0, 4],
       [2, 4, 2, 1],
       [1, 2, 3, 5],
       [4, 4, 4, 1]])

In [190]: bincount2D_vectorized(a)
Out[190]: 
array([[1, 2, 0, 0, 1, 0],
       [0, 1, 2, 0, 1, 0],
       [0, 1, 1, 1, 0, 1],
       [0, 1, 0, 0, 3, 0]])

Numba 调整

我们可以引入 numba 以进一步加速.现在，numba 允许进行一些调整.

We can bring in numba for further speedups. Now, numba allows few tweaks.

首先，它允许 JIT 编译.

First off, it allows JIT compilation.

此外，最近他们引入了实验性的parallel 自动并行化已知具有并行语义的函数中的操作.

Also, recently they had introduced experimental parallel that automatically parallelizes operations in the function known to have parallel semantics.

最后的调整是使用 prange 作为 range 的替代品.文档指出这并行运行循环，类似于 OpenMP 并行循环和 Cython 的 prange.prange 在较大的数据集上表现良好，这可能是因为设置并行工作所需的开销.

Final tweak would be to use prange as a subsititute for range. The docs state that this runs loops in parallel, similar to OpenMP parallel for loops and Cython’s prange. prange performs well with larger datasets, which probably is because of the overhead needed to setup the parallel work.

因此，通过这两个新的调整以及 njit 对于非 Python 模式，我们将有三个变体 -

So, with these new two tweaks along with the njit for no-Python mode, we would have three variants -

# Numba solutions
def bincount2D_numba(a, use_parallel=False, use_prange=False):
    N = a.max()+1
    m,n = a.shape
    out = np.zeros((m,N),dtype=int)

    # Choose fucntion based on args
    func = bincount2D_numba_func0
    if use_parallel:
        if use_prange:
            func = bincount2D_numba_func2
        else:
            func = bincount2D_numba_func1
    # Run chosen function on input data and output
    func(a, out, m, n)
    return out

@njit
def bincount2D_numba_func0(a, out, m, n):
    for i in range(m):
        for j in range(n):
            out[i,a[i,j]] += 1

@njit(parallel=True)
def bincount2D_numba_func1(a, out, m, n):
    for i in range(m):
        for j in range(n):
            out[i,a[i,j]] += 1

@njit(parallel=True)
def bincount2D_numba_func2(a, out, m, n):
    for i in prange(m):
        for j in prange(n):
            out[i,a[i,j]] += 1

为了完整性和稍后测试，循环版本将是 -

For completeness and testing out later on, the loopy version would be -

# Loopy solution
def bincount2D_loopy(a):
    N = a.max()+1
    m,n = a.shape
    out = np.zeros((m,N),dtype=int)
    for i in range(m):
        out[i] = np.bincount(a[i], minlength=N)
    return out

运行时测试

案例#1:

In [312]: a = np.random.randint(0,100,(100,100))

In [313]: %timeit bincount2D_loopy(a)
     ...: %timeit bincount2D_vectorized(a)
     ...: %timeit bincount2D_numba(a, use_parallel=False, use_prange=False)
     ...: %timeit bincount2D_numba(a, use_parallel=True, use_prange=False)
     ...: %timeit bincount2D_numba(a, use_parallel=True, use_prange=True)
10000 loops, best of 3: 115 µs per loop
10000 loops, best of 3: 36.7 µs per loop
10000 loops, best of 3: 22.6 µs per loop
10000 loops, best of 3: 22.7 µs per loop
10000 loops, best of 3: 39.9 µs per loop

案例#2:

In [316]: a = np.random.randint(0,100,(1000,1000))

In [317]: %timeit bincount2D_loopy(a)
     ...: %timeit bincount2D_vectorized(a)
     ...: %timeit bincount2D_numba(a, use_parallel=False, use_prange=False)
     ...: %timeit bincount2D_numba(a, use_parallel=True, use_prange=False)
     ...: %timeit bincount2D_numba(a, use_parallel=True, use_prange=True)
100 loops, best of 3: 2.97 ms per loop
100 loops, best of 3: 3.54 ms per loop
1000 loops, best of 3: 1.83 ms per loop
100 loops, best of 3: 1.78 ms per loop
1000 loops, best of 3: 1.4 ms per loop

案例#3:

In [318]: a = np.random.randint(0,1000,(1000,1000))

In [319]: %timeit bincount2D_loopy(a)
     ...: %timeit bincount2D_vectorized(a)
     ...: %timeit bincount2D_numba(a, use_parallel=False, use_prange=False)
     ...: %timeit bincount2D_numba(a, use_parallel=True, use_prange=False)
     ...: %timeit bincount2D_numba(a, use_parallel=True, use_prange=True)
100 loops, best of 3: 4.01 ms per loop
100 loops, best of 3: 4.86 ms per loop
100 loops, best of 3: 3.21 ms per loop
100 loops, best of 3: 3.18 ms per loop
100 loops, best of 3: 2.45 ms per loop

似乎 numba 变体表现非常好.从三个变体中选择一个取决于输入数组形状参数，并在某种程度上取决于其中的唯一元素的数量.

Seems like the numba variants are performing very well. Choosing one out of the three variants would depend on the input array shape parameters and to some extent on the number of unique elements in it.

这篇关于每行的 Bin 元素 - NumPy 的矢量化 2D Bincount的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

每行的 Bin 元素 - NumPy 的矢量化 2D Bincount [英] Bin elements per row - Vectorized 2D Bincount for NumPy

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

每行的 Bin 元素 - NumPy 的矢量化 2D Bincount [英] Bin elements per row - Vectorized 2D Bincount for NumPy

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭