Numba `cache=True` 没有效果 [英] Numba `cache=True ` has no effect

查看:79
本文介绍了Numba `cache=True` 没有效果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我写下面的代码来测试 numba 的缓存特性

I write below code to test cache feature of numba

import numba
import numpy as np
import time
@numba.njit(cache=True)
def sum2d(arr):
    M, N = arr.shape
    result = 0.0
    for i in range(M):
        for j in range(N):
            result += arr[i,j]
    return result
a=np.random.random((1000,100))
print(time.time())
sum2d(a)
print(time.time())
print(time.time())
sum2d(a)
print(time.time())

虽然在pycache文件夹中生成了一些缓存文件,时间总是一样的

Though, there are some cache files generated in pycache folder, the timing is always the same like

1576855294.8787484
1576855295.5378428
1576855295.5378428
1576855295.5388253

无论我运行这个脚本多少次,这意味着第一次运行 sum2d 需要更多的时间来编译.那么pycache文件夹中缓存文件的用途是什么?

no matter how many times I run this script, which means that first run of sum2d takes much more time to compile. Then what is usage of cache file in pycache folder?

推荐答案

以下脚本说明了 cache=True 的要点.它首先调用一个非缓存的 dummy 函数,该函数吸收初始化 numba 所需的时间.然后它继续调用两次没有缓存的 sum2d 函数和两次带有缓存的 sum2d 函数.

The following script illustrates the point of cache=True. It first calls a non-cached dummy function that absorbs the time it takes to initialize numba. Then it proceeds with calling twice the sum2d function with no cache and twice the sum2d function with cache.

import numba
import numpy as np
import time

@numba.njit
def dummy():
    return None

@numba.njit
def sum2d_nocache(arr):
    M, N = arr.shape
    result = 0.0
    for i in range(M):
        for j in range(N):
            result += arr[i,j]
    return result

@numba.njit(cache=True)
def sum2d_cache(arr):
    M, N = arr.shape
    result = 0.0
    for i in range(M):
        for j in range(N):
            result += arr[i,j]
    return result

start = time.time()
dummy()
end = time.time()
print(f'Dummy timing {end - start}')

a=np.random.random((1000,100))
start = time.time()
sum2d_nocache(a)
end = time.time()
print(f'No cache 1st timing {end - start}')

a=np.random.random((1000,100))
start = time.time()
sum2d_nocache(a)
end = time.time()
print(f'No cache 2nd timing {end - start}')

a=np.random.random((1000,100))
start = time.time()
sum2d_cache(a)
end = time.time()
print(f'Cache 1st timing {end - start}')

a=np.random.random((1000,100))
start = time.time()
sum2d_cache(a)
end = time.time()
print(f'Cache 2nd timing {end - start}')

第一次运行后的输出:

    Dummy timing 0.10361385345458984
    No cache 1st timing 0.08893513679504395
    No cache 2nd timing 0.00020122528076171875
    Cache 1st timing 0.08929300308227539
    Cache 2nd timing 0.00015544891357421875

第二次运行后的输出:

    Dummy timing 0.08973526954650879
    No cache 1st timing 0.0809786319732666
    No cache 2nd timing 0.0001163482666015625
    Cache 1st timing 0.0016787052154541016
    Cache 2nd timing 0.0001163482666015625

这个输出告诉我们什么?

What does this output tells us?

  • 初始化numba 的时间不可忽略.
  • 在第一次运行期间,由于编译时间的原因,缓存和非缓存版本的第一次调用需要更长的时间.
  • 在本例中,缓存文件的创建没有太大区别.
  • 在第二次运行中,对缓存函数的第一次调用要快得多(这就是 cache=True 的用途)
  • 对缓存和非缓存函数的后续调用花费的时间大致相同.
  • The time to initialize numba is not negligible.
  • During the first run, the first call of the cache and non-cache version take longer due to compilation time.
  • In this example, the creation of the cache file doesn't make much of a difference.
  • In the second run, the first call to the cache function is much faster (this is what cache=True is for)
  • The subsequent calls to the cache and non-cache functions take approximately the same time.

使用cache=True 的目的是避免在每次运行脚本时重复大型复杂函数的编译时间.在这个例子中,函数很简单,节省的时间有限,但对于一个具有许多更复杂函数的脚本,使用缓存可以显着减少运行时间.

The point of using cache=True is to avoid repeating the compile time of large and complex functions at each run of a script. In this example the function is simple and the time saving is limited but for a script with a number of more complex functions, using cache can significantly reduce the run-time.

这篇关于Numba `cache=True` 没有效果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆