numba @jit会降低纯Python的速度吗? [英] numba @jit slower that pure python?
问题描述
所以我需要缩短我一直在努力的脚本的执行时间.我开始使用numba jit装饰器来尝试并行计算,但是这让我很失望
so i need to improve the execution time for a script that i have been working on. I started working with numba jit decorator to try parallel computing however it throws me
KeyError: "Does not support option: 'parallel'"
所以我决定测试nogil是否可以从我的cpu上释放全部功能,但是它比纯python慢,我不明白为什么会这样,如果有人可以帮助我或指导我,我将非常感激
so i decided to test the nogil if it unlocks the whole capabilities from my cpu but it was slower than pure python i dont understand why this happened, and if someone can help me or guide me i will be very grateful
import numpy as np
from numba import *
@jit(['float64[:,:],float64[:,:]'],'(n,m),(n,m)->(n,m)',nogil=True)
def asd(x,y):
return x+y
u=np.random.random(100)
w=np.random.random(100)
%timeit asd(u,w)
%timeit u+w
10000个循环,最佳3:每个循环137 µs 最慢的运行比最快的运行时间长7.13倍.这可能意味着正在缓存中间结果 1000000次循环,每循环3次最佳1.75 µs
10000 loops, best of 3: 137 µs per loop The slowest run took 7.13 times longer than the fastest. This could mean that an intermediate result is being cached 1000000 loops, best of 3: 1.75 µs per loop
推荐答案
在这样简单的矢量化操作上,您不能指望numba胜过numpy.另外,由于numba函数包括外部函数调用的开销,因此您的比较也不完全公平.如果对一个更大的数组求和,您将看到两个函数的性能收敛,并且您看到的只是在非常快速的操作上的开销:
You cannot expect numba to outperform numpy on such a simple vectorized operation. Also your comparison isn't exactly fair since the numba function includes the cost of the outside function call. If you sum a larger array, you'll see that the performance of the two converge and what you are seeing is just overhead on a very fast operation:
import numpy as np
import numba as nb
@nb.njit
def asd(x,y):
return x+y
def asd2(x, y):
return x + y
u=np.random.random(10000)
w=np.random.random(10000)
%timeit asd(u,w)
%timeit asd2(u,w)
The slowest run took 17796.43 times longer than the fastest. This could mean
that an intermediate result is being cached.
100000 loops, best of 3: 6.06 µs per loop
The slowest run took 29.94 times longer than the fastest. This could mean that
an intermediate result is being cached.
100000 loops, best of 3: 5.11 µs per loop
就并行功能而言,对于此简单操作,可以使用nb.vectorize
:
As far as parallel functionality, for this simple operation, you can use nb.vectorize
:
@nb.vectorize([nb.float64(nb.float64, nb.float64)], target='parallel')
def asd3(x, y):
return x + y
u=np.random.random((100000, 10))
w=np.random.random((100000, 10))
%timeit asd(u,w)
%timeit asd2(u,w)
%timeit asd3(u,w)
但是同样,如果您在小型数组上进行操作,将会看到线程分派的开销.对于上面的数组大小,我看到并行使我的速度提高了2倍.
But again, if you operate on small arrays, you are going to be seeing the overhead of thread dispatch. For the array sizes above, I see the parallel giving me a 2x speedup.
numba真正发挥作用的地方是使用广播难以执行numpy中的操作,或者当操作会导致大量临时中间数组分配时.
Where numba really shines is doing operations that are difficult to do in numpy using broadcasting, or when operations would result in a lot of temporary intermediate array allocations.
这篇关于numba @jit会降低纯Python的速度吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!