numba 编译逻辑比较中的性能损失 [英] Performance loss in numba compiled logic comparison

查看：66 发布时间：2021/6/15 19:45:11 python performance compiler-construction numba

本文介绍了numba 编译逻辑比较中的性能损失的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

以下用于逻辑比较的 numba 编译函数性能下降的原因是什么:

What could be a reason for performance degradation in the following numba compiled function for logic comparison:

from numba import njit

t = (True, 'and_', False)

#@njit(boolean(boolean, unicode_type, boolean))    
@njit
def f(a,b,c):
    if b == 'and_':
        out = a&c
    elif b == 'or_':
        out = a|c
    return out
x = f(*t)
%timeit f(*t)
#1.78 µs ± 9.52 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

%timeit f.py_func(*t)
#108 ns ± 0.0042 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

按照答案中的建议进行大规模测试:

To test this at scale as suggested in the answer:

x = np.random.choice([True,False], 1000000)
y = np.random.choice(["and_","or_"], 1000000)
z = np.random.choice([False, True], 1000000)

#using jit compiled f
def f2(x,y,z):
    L = x.shape[0]
    out = np.empty(L)
    for i in range(L):
        out[i] = f(x[i],y[i],z[i])
    return out

%timeit f2(x,y,z)
#2.79 s ± 86.4 ms per loop

#using pure Python f
def f3(x,y,z):
    L = x.shape[0]
    out = np.empty(L)
    for i in range(L):
        out[i] = f.py_func(x[i],y[i],z[i])
    return out

%timeit f3(x,y,z)
#572 ms ± 24.3 ms per

我是否遗漏了什么，是否有一种方法可以快速"编译版本，因为这将成为执行 ~ 1e6 次循环的一部分.

Am I missing something and if there a way to compile "fast" version, because this is a going to be part of a loop executed ~ 1e6 times.

推荐答案

您的工作粒度太小.Numba 不是为此而设计的.您看到的几乎所有执行时间都来自包装/解包参数、类型检查、Python 函数包装、引用计数等的开销.此外，使用 Numba 的好处非常小，因为 Numba 几乎没有优化unicode 字符串操作.

You are working at a too small granularity. Numba is not designed for that. Almost all the execution time you see comes from the overhead of wrapping/unwrapping parameters, type checks, Python function wrapping, reference counting, etc. Moreover the benefit of using Numba is very small here since Numba barely optimizes unicode string operations.

检验这个假设的一种方法是执行以下简单的函数:

One way to check this hypothesis is to just execute the following trivial function:

@njit
def f(a,b,c):
    return a
x = f(True, 'and_', False)
%timeit f(True, 'and_', False)

普通函数和原始版本在我的机器上都需要 1.34 µs.

Both the trivial function and the original version takes 1.34 µs on my machine.

此外，您可以反汇编 Numba 函数以查看执行多少指令来执行一次调用并深入了解开销的来源.

Additionally, you can disassemble the Numba function to see how much instructions are executed to perform just one call and understand deeply where the overheads are coming from.

如果您希望 Numba 有用，您需要在编译函数中添加更多工作，可能通过直接处理数组/列表.如果由于输入类型的动态特性而无法做到这一点，那么 Numpy 在这里可能不是正确的工具.您可以尝试修改一下代码并改用 PyPy.编写本机 C/C++ 模块可能会有所帮助，但大部分时间将用于操作动态对象和 unicode 字符串以及进行类型自省，除非您重写整个代码.

If you want Numba to be useful, you need to add more work in the compiled function, possibly by working directly on arrays/lists. If this is not possible because of the dynamic nature of the input type, then Numpy may not be the right tool for this here. You could try to rework a bit your code and use PyPy instead. Writing a native C/C++ module may help a bit but most of the time will be spend in manipulating dynamic objects and unicode string as well as doing type introspection, unless you rewrite the whole code.

更新

上述开销仅在从 Python 类型转换到 Numba 时(反之亦然)支付.您可以通过以下基准测试看到这一点:

The above overhead is only paid when transitioning from Python types to Numba (and the other way around). You can see that with the following benchmark:

@njit
def f(a,b,c):
    if b == 'and_':
        out = a&c
    elif b == 'or_':
        out = a|c
    return out
@jit
def manyCalls(a, b, c):
    res = True
    for i in range(1_000_000):
        res ^= f(a, b, c ^ res)
    return res

t = (True, 'and_', False)
x = manyCalls(*t)
%timeit manyCalls(*t)

在我的机器上调用 manyCalls 需要 3.62 毫秒.这意味着每次调用 f 平均需要 3.6 ns(16 个周期).这意味着开销只支付一次(当 manyCalls 被调用时).

Calling manyCalls takes 3.62 ms on my machine. This means each call to f takes 3.6 ns in average (16 cycles). This means the overhead is paid only once (when manyCalls is called).

这篇关于numba 编译逻辑比较中的性能损失的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

numba 编译逻辑比较中的性能损失 [英] Performance loss in numba compiled logic comparison

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

numba 编译逻辑比较中的性能损失 [英] Performance loss in numba compiled logic comparison

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭