numba什么时候有效? [英] When numba is effective?

查看:29
本文介绍了numba什么时候有效?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道 numba 会产生一些开销,并且在某些情况下(非密集型计算)它会变得比纯 python 慢.但我不知道在哪里划线.是否可以使用算法复杂度的顺序来找出在哪里?

例如在这段代码中添加两个比 5 更短的数组 (~O(n)) 纯 python 更快:

def sum_1(a,b):结果 = 0.0对于 zip(a,b) 中的 i,j:结果 += (i+j)返回结果@numba.jit('float64[:](float64[:],float64[:])')def sum_2(a,b):结果 = 0.0对于 zip(a,b) 中的 i,j:结果 += (i+j)返回结果# 尝试 100a = np.linspace(1.0,2.0,5)b = np.linspace(1.0,2.0,5)打印(纯蟒蛇:")%timeit -o sum_1(a,b)打印("\n\n\n\npython + numba:")%timeit -o sum_2(a,b)

更新:我正在寻找的是类似的指南,例如

是的,numba 函数对于小数组是最快的,但是对于较长的数组,NumPy 解决方案会稍微快一些.Python 解决方案速度较慢,但​​更快"的替代方案已经比您最初提出的解决方案快得多.

在这种情况下,我将简单地使用 NumPy 解决方案,因为它简短、可读且快速,除非您处理大量短数组并多次调用该函数 - 那么 numba 解决方案会明显更好.

I know numba creates some overheads and in some situations (non-intensive computation) it become slower that pure python. But what I don't know is where to draw the line. Is it possible to use order of algorithm complexity to figure out where?

for example for adding two arrays (~O(n)) shorter that 5 in this code pure python is faster:

def sum_1(a,b):
    result = 0.0
    for i,j in zip(a,b):
            result += (i+j)
    return result

@numba.jit('float64[:](float64[:],float64[:])')
def sum_2(a,b):
    result = 0.0
    for i,j in zip(a,b):
            result += (i+j)
    return result

# try 100
a = np.linspace(1.0,2.0,5)
b = np.linspace(1.0,2.0,5)
print("pure python: ")
%timeit -o sum_1(a,b)
print("\n\n\n\npython + numba: ")
%timeit -o sum_2(a,b)

UPDADE: what I am looking for is a similar guideline like here:

"A general guideline is to choose different targets for different data sizes and algorithms. The "cpu" target works well for small data sizes (approx. less than 1KB) and low compute intensity algorithms. It has the least amount of overhead. The "parallel" target works well for medium data sizes (approx. less than 1MB). Threading adds a small delay. The "cuda" target works well for big data sizes (approx. greater than 1MB) and high compute intensity algorithms. Transfering memory to and from the GPU adds significant overhead."

解决方案

It's hard to draw the line when numba becomes effective. However there are a few indicators when it might not be effective:

  • If you cannot use jit with nopython=True - whenever you cannot compile it in nopython mode you either try to compile too much or it won't be significantly faster.

  • If you don't use arrays - When you deal with lists or other types that you pass to the numba function (except from other numba functions), numba needs to copy these which incurs a significant overhead.

  • If there is already a NumPy or SciPy function that does it - even if numba can be significantly faster for short arrays it will almost always be as fast for longer arrays (also you might easily neglect some common edge cases that these would handle).

There's also another reason why you might not want to use numba in cases where it's just "a bit" faster than other solutions: Numba functions have to be compiled, either ahead-of-time or when first called, in some situations the compilation will be much slower than your gain, even if you call it hundreds of times. Also the compilation times add up: numba is slow to import and compiling the numba functions also adds some overhead. It doesn't make sense to shave off a few milliseconds if the import overhead increased by 1-10 seconds.

Also numba is complicated to install (without conda at least) so if you want to share your code then you have a really "heavy dependency".


Your example is lacking a comparison with NumPy methods and a highly optimized version of pure Python. I added some more comparison functions and did a benchmark (using my library simple_benchmark):

import numpy as np
import numba as nb
from itertools import chain

def python_loop(a,b):
    result = 0.0
    for i,j in zip(a,b):
        result += (i+j)
    return result

@nb.njit
def numba_loop(a,b):
    result = 0.0
    for i,j in zip(a,b):
            result += (i+j)
    return result

def numpy_methods(a, b):
    return a.sum() + b.sum()

def python_sum(a, b):
    return sum(chain(a.tolist(), b.tolist()))

from simple_benchmark import benchmark, MultiArgument

arguments = {
    2**i: MultiArgument([np.zeros(2**i), np.zeros(2**i)])
    for i in range(2, 17)
}
b = benchmark([python_loop, numba_loop, numpy_methods, python_sum], arguments, warmups=[numba_loop])

%matplotlib notebook
b.plot()

Yes, the numba function is fastest for small arrays, however the NumPy solution will be slightly faster for longer arrays. The Python solutions are slower but the "faster" alternative is already significantly faster than your original proposed solution.

In this case I would simply use the NumPy solution because it's short, readable and fast, except when you're dealing with lots of short arrays and call the function a lot of times - then the numba solution would be significantly better.

这篇关于numba什么时候有效?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆