Numba比numpy慢3倍 [英] Numba 3x slower than numpy

查看:197
本文介绍了Numba比numpy慢3倍的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们有一个矢量numpy get_pos_neg_bitwise 函数,该函数使用mask = [132 20 192] df.shape为(500e3,4),我们希望通过numba进行加速.

We have a vectorial numpy get_pos_neg_bitwise function that use a mask=[132 20 192] and a df.shape of (500e3, 4) that we want to accelerate with numba.

from numba import jit
import numpy as np
from time import time

def get_pos_neg_bitwise(df, mask):
    """
    In [1]: print mask
    [132  20 192]

    In [1]: print df
    [[  1 162  97  41]
     [  0 136 135 171]
     ...,
     [  0 245  30  73]]

    """
    check = (np.bitwise_and(mask, df[:, 1:]) == mask).all(axis=1)
    pos = (df[:, 0] == 1) & check
    neg = (df[:, 0] == 0) & check
    pos = np.nonzero(pos)[0]
    neg = np.nonzero(neg)[0]
    return (pos, neg)

使用@morningsun的提示,我们制作了以下numba版本:

Using tips from @morningsun we made this numba version:

@jit(nopython=True)
def numba_get_pos_neg_bitwise(df, mask):
    posneg = np.zeros((df.shape[0], 2))
    for idx in range(df.shape[0]):
        vandmask = np.bitwise_and(df[idx, 1:], mask)

        # numba fail with # if np.all(vandmask == mask):
        vandm_equal_m = 1
        for i, val in enumerate(vandmask):
            if val != mask[i]:
                vandm_equal_m = 0
                break
        if vandm_equal_m == 1:
            if df[idx, 0] == 1:
                posneg[idx, 0] = 1
            else:
                posneg[idx, 1] = 1
    pos = list(np.nonzero(posneg[:, 0])[0])
    neg = list(np.nonzero(posneg[:, 1])[0])
    return (pos, neg)

但是它仍然比数位小子慢3倍(〜0.06s Vs〜0.02s).

But it still 3 times slower than the numpy one (~0.06s Vs ~0,02s).

if __name__ == '__main__':

    df = np.array(np.random.randint(256, size=(int(500e3), 4)))
    df[:, 0] = np.random.randint(2, size=(1, df.shape[0]))  # set target to 0 or 1
    mask = np.array([132,  20, 192])

    start = time()
    pos, neg = get_pos_neg_bitwise(df, mask)
    msg = '==> pos, neg made; p={}, n={} in [{:.4} s] numpy'
    print msg.format(len(pos), len(neg), time() - start)

    start = time()
    msg = '==> pos, neg made; p={}, n={} in [{:.4} s] numba'
    pos, neg = numba_get_pos_neg_bitwise(df, mask)
    print msg.format(len(pos), len(neg), time() - start)
    start = time()
    pos, neg = numba_get_pos_neg_bitwise(df, mask)
    print msg.format(len(pos), len(neg), time() - start)

我想念什么吗?

In [1]: %run numba_test2.py
==> pos, neg made; p=3852, n=3957 in [0.02306 s] numpy
==> pos, neg made; p=3852, n=3957 in [0.3492 s] numba
==> pos, neg made; p=3852, n=3957 in [0.06425 s] numba
In [1]:

推荐答案

请尝试将呼叫移至循环外的np.bitwise_and,因为numba无法做任何事情来加快呼叫速度:

Try moving the call to np.bitwise_and outside of the loop since numba can't do anything to speed it up:

@jit(nopython=True)
def numba_get_pos_neg_bitwise(df, mask):
    posneg = np.zeros((df.shape[0], 2))
    vandmask = np.bitwise_and(df[:, 1:], mask)

    for idx in range(df.shape[0]):

        # numba fail with # if np.all(vandmask == mask):
        vandm_equal_m = 1
        for i, val in enumerate(vandmask[idx]):
            if val != mask[i]:
                vandm_equal_m = 0
                break
        if vandm_equal_m == 1:
            if df[idx, 0] == 1:
                posneg[idx, 0] = 1
            else:
                posneg[idx, 1] = 1
    pos = np.nonzero(posneg[:, 0])[0]
    neg = np.nonzero(posneg[:, 1])[0]
    return (pos, neg)

然后我得到以下时间:

==> pos, neg made; p=3920, n=4023 in [0.02352 s] numpy
==> pos, neg made; p=3920, n=4023 in [0.2896 s] numba
==> pos, neg made; p=3920, n=4023 in [0.01539 s] numba

所以现在numba比numpy快一点.

So now numba is a bit faster than numpy.

此外,它并没有太大的区别,但是在原始函数中,您返回numpy数组,而在numba版本中,您将posneg转换为列表.

Also, it didn't make a huge difference, but in your original function you return numpy arrays, while in the numba version you were converting pos and neg to lists.

尽管如此,总的来说,我猜想函数调用主要由numpy函数控制,而numba函数不能加快速度,而numpy版本的代码已经在使用快速矢量化例程.

In general though, I would guess that the function calls are dominated by numpy functions, which numba can't speed up, and the numpy version of the code is already using fast vectorization routines.

更新:

您可以通过删除enumerate调用并将其直接索引到数组中而不是获取切片来使其更快.另外,将posneg分成单独的数组有助于避免沿内存中不连续的轴切片:

You can make it faster by removing the enumerate call and index directly into the array instead of grabbing a slice. Also splitting pos and neg into separate arrays helps to avoid slicing along a non-contiguous axis in memory:

@jit(nopython=True)
def numba_get_pos_neg_bitwise(df, mask):
    pos = np.zeros(df.shape[0])
    neg = np.zeros(df.shape[0])
    vandmask = np.bitwise_and(df[:, 1:], mask)

    for idx in range(df.shape[0]):

        # numba fail with # if np.all(vandmask == mask):
        vandm_equal_m = 1
        for i in xrange(vandmask.shape[1]):
            if vandmask[idx,i] != mask[i]:
                vandm_equal_m = 0
                break
        if vandm_equal_m == 1:
            if df[idx, 0] == 1:
                pos[idx] = 1
            else:
                neg[idx] = 1
    pos = np.nonzero(pos)[0]
    neg = np.nonzero(neg)[0]
    return pos, neg

以及ipython笔记本中的计时:

And timings in an ipython notebook:

    %timeit pos1, neg1 = get_pos_neg_bitwise(df, mask)
    %timeit pos2, neg2 = numba_get_pos_neg_bitwise(df, mask)

​    100 loops, best of 3: 18.2 ms per loop
    100 loops, best of 3: 7.89 ms per loop

这篇关于Numba比numpy慢3倍的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆