提高比较算法np.packbits(A == A [:, None],axis = 1)的性能 [英] Improving performance on comparison algorithm np.packbits(A==A[:, None], axis=1)

查看:109
本文介绍了提高比较算法np.packbits(A == A [:, None],axis = 1)的性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找内存优化np.packbits(A==A[:, None], axis=1),其中A是长度为n的整数的密集数组. A==A[:, None]占用大量的n内存,因为生成的布尔数组存储效率低下,每个布尔值占用1个字节.

I am looking to memory optimise np.packbits(A==A[:, None], axis=1), where A is dense array of integers of length n. A==A[:, None] is memory hungry for large n since the resulting Boolean array is stored inefficiently with each Boolean value costing 1 byte.

我编写了以下脚本,以实现相同的结果,同时一次将一个位打包.但是,它的速度要慢3倍左右,因此我正在寻找加快速度的方法.或者,也可以选择一种内存开销较小的更好算法.

I wrote the below script to achieve the same result while packing bits one section at a time. It is, however, around 3x slower, so I am looking for ways to speed it up. Or, alternatively, a better algorithm with small memory overhead.

注意:这是我先前提出的问题的后续问题; 有效地将numpy数组与按元素进行比较.

Note: this is a follow-up question to one I asked earlier; Comparing numpy array with itself by element efficiently.

以下用于基准测试的可复制代码.

Reproducible code below for benchmarking.

import numpy as np
from numba import jit

@jit(nopython=True)
def bool2int(x):
    y = 0
    for i, j in enumerate(x):
        if j: y += int(j)<<(7-i)
    return y

@jit(nopython=True)
def compare_elementwise(arr, result, section):
    n = len(arr)
    for row in range(n):
        for col in range(n):

            section[col%8] = arr[row] == arr[col]

            if ((col + 1) % 8 == 0) or (col == (n-1)):
                result[row, col // 8] = bool2int(section)
                section[:] = 0

    return result

n = 10000
A = np.random.randint(0, 1000, n)

result_arr = np.zeros((n, n // 8 if n % 8 == 0 else n // 8 + 1)).astype(np.uint8)
selection_arr = np.zeros(8).astype(np.uint8)

# memory efficient version, but slow
packed = compare_elementwise(A, result_arr, selection_arr)

# memory inefficient version, but fast
packed2 = np.packbits(A == A[:, None], axis=1)

assert (packed == packed2).all()

%timeit compare_elementwise(A, result_arr, selection_arr)  # 1.6 seconds
%timeit np.packbits(A == A[:, None], axis=1)  # 0.460 second

推荐答案

这里的解决方案比numpy的解决方案快3倍(a.size必须是8的倍数;请参见下文):

Here is a solution 3 times faster than the numpy one (a.size must be a multiple of 8; see below) :

@nb.njit
def comp(a):
    res=np.zeros((a.size,a.size//8),np.uint8)
    for i,x in enumerate(a):
        for j,y in enumerate(a):
            if x==y: res[i,j//8] |= 128 >> j%8 
    return res

之所以有用,是因为阵列被扫描了一次,在此过程中您进行了多次扫描, 几乎所有术语都是空的.

This works because the array is scanned one time, where you do it many times, and amost all terms are null.

In [122]: %timeit np.packbits(A == A[:, None], axis=1)
389 ms ± 57.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [123]: %timeit comp(A)
123 ms ± 24.4 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

如果为a.size%8 > 0,则查找信息的成本会更高.在这种情况下,最好的方法是在初始数组中填充一些(在range(7)中)零.

If a.size%8 > 0, the cost for find back the information will be higher. The best way in this case is to pad the initial array with some (in range(7)) zeros.

为完整起见,可以这样填充:

For completeness, the padding could be done as so:

if A.size % 8 != 0: A = np.pad(A, (0, 8 - A.size % 8), 'constant', constant_values=0)

这篇关于提高比较算法np.packbits(A == A [:, None],axis = 1)的性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆