Python中的Numba Jit警告解释 [英] Numba jit warnings interpretation in python

查看:563
本文介绍了Python中的Numba Jit警告解释的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经定义了以下递归数组生成器,并正在使用Numba jit尝试加快处理速度(基于 @jit("float32[:](float32,float32,intp)", nopython=False, nogil=True) def calc_func(a, b, n): res = np.empty(n, dtype="float32") res[0] = 0 for i in range(1, n): res[i] = a * res[i - 1] + (1 - a) * (b ** (i - 1)) return res a = calc_func(0.988, 0.9988, 5000)

我收到了一堆我不太明白的警告/错误.希望能帮助您解释它们并使它们消失,以便(我假设)进一步加快计算速度.

它们在下面:

Numba警告: 由于功能"calc_func"由于以下原因而导致类型推断失败,因此编译已返回到对象模式:无效使用了带有参数类型的Function():(int64,dtype = Literalstr) *参数化

在定义0中: 所有模板均不使用文字.

在定义1中: 拒绝所有不带文字的模板. 通常,此错误是由于传递了命名函数不支持的类型的参数而引起的.

[1]在:解决被叫者类型:Function()

[2]在:在res = np.empty(n, dtype="float32")

上键入呼叫

文件"thenameofmyscript.py",第71行:

 def calc_func(a, b, n):
    res = np.empty(n, dtype="float32")
    ^
 

@jit("float32:",nopython = False,nogil = True)

myscript.py:69的名称:Numba警告: 由于功能"calc_func"由于以下原因导致类型推断失败,因此编译返回到对象模式而没有启用循环提升:无法确定<class 'numba.dispatcher.LiftedLoop'>

的Numba类型

文件"thenameofmyscript.py",第73行:

 def calc_func(a, b, n):
        <source elided>
        res[0] = 0
        for i in range(1, n):
        ^
 

@jit("float32:",nopython = False,nogil = True)

H:\ projects \ decay-optimizer \ venv \ lib \ site-packages \ numba \ compiler.py:742:Numba警告:函数"calc_func"是在对象模式下编译的,没有forceobj = True,但循环已取消. /p>

文件"thenameofmyscript.py",第70行:

 @jit("float32[:](float32,float32,intp)", nopython=False, nogil=True)
    def calc_func(a, b, n):
    ^
 

self.func_ir.loc))

H:\ projects \ decay-optimizer \ venv \ lib \ site-packages \ numba \ compiler.py:751:NumbaDeprecation警告: 已检测到从nopython编译路径回退到对象模式编译路径的行为,这已被弃用.

文件"thenameofmyscript.py",第70行:

 @jit("float32[:](float32,float32,intp)", nopython=False, nogil=True)
    def calc_func(a, b, n):
    ^
 

warnings.warn(errors.NumbaDeprecationWarning(msg,self.func_ir.loc))

thenameofmyscript.py:69:Numba警告:尽管nogil = True,在对象模式下运行的代码仍不允许并行执行. @jit("float32:",nopython = False,nogil = True)

解决方案

1.优化功能(代数简化)

现代CPU的加,减和乘运算非常快.尽可能避免进行求幂等操作.

示例

在此示例中,我通过简单的乘法替换了昂贵的幂运算.诸如此类的简化可能会导致很高的加速比,但也可能会改变结果.

首先,您的实现(float64)没有任何签名,稍后将在另一个简单示例中进行处理.

#nb.jit(nopython=True) is a shortcut for @nb.njit()
@nb.njit()
def calc_func_opt_1(a, b, n):
    res = np.empty(n, dtype=np.float64)
    fact=b
    res[0] = 0.
    res[1] = a * res[0] + (1. - a) *1.
    res[2] = a * res[1] + (1. - a) * fact
    for i in range(3, n):
        fact*=b
        res[i] = a * res[i - 1] + (1. - a) * fact
    return res

一个好主意是尽可能使用标量.

@nb.njit()
def calc_func_opt_2(a, b, n):
    res = np.empty(n, dtype=np.float64)
    fact_1=b
    fact_2=0.
    res[0] = fact_2
    fact_2=a * fact_2 + (1. - a) *1.
    res[1] = fact_2
    fact_2 = a * fact_2 + (1. - a) * fact_1
    res[2]=fact_2
    for i in range(3, n):
        fact_1*=b
        fact_2= a * fact_2 + (1. - a) * fact_1
        res[i] = fact_2
    return res

时间

%timeit a = calc_func(0.988, 0.9988, 5000)
222 µs ± 2.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit a = calc_func_opt_1(0.988, 0.9988, 5000)
22.7 µs ± 45.5 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit a = calc_func_opt_2(0.988, 0.9988, 5000)
15.3 µs ± 35.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

2.签名值得推荐吗?

在提前"模式(AOT)中,签名是必需的,但在通常的JIT模式下则不需要.上面的示例不是SIMD可矢量化的.因此,您不会看到输入和输出的可能不是最佳声明的正面或负面影响. 让我们看另一个例子.

#Numba is able to SIMD-vectorize this loop if 
#a,b,res are contigous arrays
@nb.njit(fastmath=True)
def some_function_1(a,b):
    res=np.empty_like(a)
    for i in range(a.shape[0]):
        res[i]=a[i]**2+b[i]**2
    return res

@nb.njit("float64[:](float64[:],float64[:])",fastmath=True)
def some_function_2(a,b):
    res=np.empty_like(a)
    for i in range(a.shape[0]):
        res[i]=a[i]**2+b[i]**2
    return res

a=np.random.rand(10_000)
b=np.random.rand(10_000)

#Example for non contiguous input
#a=np.random.rand(10_000)[0::2]
#b=np.random.rand(10_000)[0::2]

%timeit res=some_function_1(a,b)
5.59 µs ± 36.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit res=some_function_2(a,b)
9.36 µs ± 47.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

为什么带有签名的版本较慢?

让我们仔细看看签名.

some_function_1.nopython_signatures
#[(array(float64, 1d, C), array(float64, 1d, C)) -> array(float64, 1d, C)]
some_function_2.nopython_signatures
#[(array(float64, 1d, A), array(float64, 1d, A)) -> array(float64, 1d, A)]
#this is equivivalent to 
#"float64[::1](float64[::1],float64[::1])"

如果在编译时内存布局未知,则通常无法对算法进行SIMD矢量化.当然,您可以显式声明C连续数组,但是该函数将不再适用于非连续输入,这通常是不希望的.

I have defined the following recursive array generator and am using Numba jit to try and accelerate the processing (based on this SO answer)

@jit("float32[:](float32,float32,intp)", nopython=False, nogil=True)
def calc_func(a, b, n):
    res = np.empty(n, dtype="float32")
    res[0] = 0
    for i in range(1, n):
        res[i] = a * res[i - 1] + (1 - a) * (b ** (i - 1))
    return res
a = calc_func(0.988, 0.9988, 5000)

I am getting a bunch of warnings/errors that I do not quite get. Would appreciate help in explaining them and making them disappear in order to (I'm assuming) speed up the calculation even more.

Here they are below :

NumbaWarning: Compilation is falling back to object mode WITH looplifting enabled because Function "calc_func" failed type inference due to: Invalid use of Function() with argument(s) of type(s): (int64, dtype=Literalstr) * parameterized

In definition 0: All templates rejected with literals.

In definition 1: All templates rejected without literals. This error is usually caused by passing an argument of a type that is unsupported by the named function.

[1] During: resolving callee type: Function()

[2] During: typing of call at res = np.empty(n, dtype="float32")

File "thenameofmyscript.py", line 71:

def calc_func(a, b, n):
    res = np.empty(n, dtype="float32")
    ^

@jit("float32:", nopython=False, nogil=True)

thenameofmyscript.py:69: NumbaWarning: Compilation is falling back to object mode WITHOUT looplifting enabled because Function "calc_func" failed type inference due to: cannot determine Numba type of <class 'numba.dispatcher.LiftedLoop'>

File "thenameofmyscript.py", line 73:

def calc_func(a, b, n):
        <source elided>
        res[0] = 0
        for i in range(1, n):
        ^

@jit("float32:", nopython=False, nogil=True)

H:\projects\decay-optimizer\venv\lib\site-packages\numba\compiler.py:742: NumbaWarning: Function "calc_func" was compiled in object mode without forceobj=True, but has lifted loops.

File "thenameofmyscript.py", line 70:

@jit("float32[:](float32,float32,intp)", nopython=False, nogil=True)
    def calc_func(a, b, n):
    ^

self.func_ir.loc))

H:\projects\decay-optimizer\venv\lib\site-packages\numba\compiler.py:751: NumbaDeprecationWarning: Fall-back from the nopython compilation path to the object mode compilation path has been detected, this is deprecated behaviour.

File "thenameofmyscript.py", line 70:

@jit("float32[:](float32,float32,intp)", nopython=False, nogil=True)
    def calc_func(a, b, n):
    ^

warnings.warn(errors.NumbaDeprecationWarning(msg, self.func_ir.loc))

thenameofmyscript.py:69: NumbaWarning: Code running in object mode won't allow parallel execution despite nogil=True. @jit("float32:", nopython=False, nogil=True)

解决方案

1. Optimize the function (algebraic simplification)

Modern CPUs are quite fast at additions, subtractions and multiplications. Operations like exponentiation, should be avoided when possible.

Example

In this example I replaced the costly exponentiation by a simple multiplication. Simplifications like that can lead to quite high speedups, but also may change the result.

At first your implementation (float64) without any signatures, I will treat this later on another simple example.

#nb.jit(nopython=True) is a shortcut for @nb.njit()
@nb.njit()
def calc_func_opt_1(a, b, n):
    res = np.empty(n, dtype=np.float64)
    fact=b
    res[0] = 0.
    res[1] = a * res[0] + (1. - a) *1.
    res[2] = a * res[1] + (1. - a) * fact
    for i in range(3, n):
        fact*=b
        res[i] = a * res[i - 1] + (1. - a) * fact
    return res

Also a good idea is to use scalars where possible.

@nb.njit()
def calc_func_opt_2(a, b, n):
    res = np.empty(n, dtype=np.float64)
    fact_1=b
    fact_2=0.
    res[0] = fact_2
    fact_2=a * fact_2 + (1. - a) *1.
    res[1] = fact_2
    fact_2 = a * fact_2 + (1. - a) * fact_1
    res[2]=fact_2
    for i in range(3, n):
        fact_1*=b
        fact_2= a * fact_2 + (1. - a) * fact_1
        res[i] = fact_2
    return res

Timings

%timeit a = calc_func(0.988, 0.9988, 5000)
222 µs ± 2.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit a = calc_func_opt_1(0.988, 0.9988, 5000)
22.7 µs ± 45.5 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit a = calc_func_opt_2(0.988, 0.9988, 5000)
15.3 µs ± 35.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

2. Are signatures recommendable?

In Ahead of time mode (AOT) signatures are necessary, but not in the usual JIT mode. The example above is not SIMD- vectorizable. So you won't see much positive nor negative effects of a possibly not optimal declaration of in- and outputs. Let's look at another example.

#Numba is able to SIMD-vectorize this loop if 
#a,b,res are contigous arrays
@nb.njit(fastmath=True)
def some_function_1(a,b):
    res=np.empty_like(a)
    for i in range(a.shape[0]):
        res[i]=a[i]**2+b[i]**2
    return res

@nb.njit("float64[:](float64[:],float64[:])",fastmath=True)
def some_function_2(a,b):
    res=np.empty_like(a)
    for i in range(a.shape[0]):
        res[i]=a[i]**2+b[i]**2
    return res

a=np.random.rand(10_000)
b=np.random.rand(10_000)

#Example for non contiguous input
#a=np.random.rand(10_000)[0::2]
#b=np.random.rand(10_000)[0::2]

%timeit res=some_function_1(a,b)
5.59 µs ± 36.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit res=some_function_2(a,b)
9.36 µs ± 47.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Why is the version with signatures slower?

Let's have a closer look on the signatures.

some_function_1.nopython_signatures
#[(array(float64, 1d, C), array(float64, 1d, C)) -> array(float64, 1d, C)]
some_function_2.nopython_signatures
#[(array(float64, 1d, A), array(float64, 1d, A)) -> array(float64, 1d, A)]
#this is equivivalent to 
#"float64[::1](float64[::1],float64[::1])"

If the memory layout is unknown at compile time, it is often impossible to SIMD- vectorize the algorithm. Of course you can explicitly declare C-contigous arrays, but the function wont work anymore for non contigous inputs, which is normally not intended.

这篇关于Python中的Numba Jit警告解释的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆