为什么numba比numpy快? [英] Why is numba faster than numpy here?

查看：306 发布时间：2020/5/18 19:39:55 python numpy numba

本文介绍了为什么numba比numpy快?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我不知道为什么numba在这里击败numpy(超过3倍).我在这里进行基准测试时是否犯了一些根本性的错误?似乎对于numpy来说是完美的情况，不是吗?请注意，作为检查，我还运行了一个结合了numba和numpy的变体(未显示)，正如预期的那样，与不带numba的numpy运行相同.

I can't figure out why numba is beating numpy here (over 3x). Did I make some fundamental error in how I am benchmarking here? Seems like the perfect situation for numpy, no? Note that as a check, I also ran a variation combining numba and numpy (not shown), which as expected was the same as running numpy without numba.

(顺便说一下，这是对以下问题的跟进问题:

(btw this is a followup question to: Fastest way to numerically process 2d-array: dataframe vs series vs array vs numba )

import numpy as np
from numba import jit
nobs = 10000 

def proc_numpy(x,y,z):

   x = x*2 - ( y * 55 )      # these 4 lines represent use cases
   y = x + y*2               # where the processing time is mostly
   z = x + y + 99            # a function of, say, 50 to 200 lines
   z = z * ( z - .88 )       # of fairly simple numerical operations

   return z

@jit
def proc_numba(xx,yy,zz):
   for j in range(nobs):     # as pointed out by Llopis, this for loop 
      x, y = xx[j], yy[j]    # is not needed here.  it is here by 
                             # accident because in the original benchmarks 
      x = x*2 - ( y * 55 )   # I was doing data creation inside the function 
      y = x + y*2            # instead of passing it in as an array
      z = x + y + 99         # in any case, this redundant code seems to 
      z = z * ( z - .88 )    # have something to do with the code running
                             # faster.  without the redundant code, the 
      zz[j] = z              # numba and numpy functions are exactly the same.
   return zz

x = np.random.randn(nobs)
y = np.random.randn(nobs)
z = np.zeros(nobs)
res_numpy = proc_numpy(x,y,z)

z = np.zeros(nobs)
res_numba = proc_numba(x,y,z)

结果:

In [356]: np.all( res_numpy == res_numba )
Out[356]: True

In [357]: %timeit proc_numpy(x,y,z)
10000 loops, best of 3: 105 µs per loop

In [358]: %timeit proc_numba(x,y,z)
10000 loops, best of 3: 28.6 µs per loop

我在2012年的macbook air(13.3)，标准的anaconda发行版上运行了该软件.如果相关的话，我可以提供有关设置的更多详细信息.

I ran this on a 2012 macbook air (13.3), standard anaconda distribution. I can provide more detail on my setup if it's relevant.

推荐答案

我认为这个问题强调了(某种程度上)从高级语言调用预编译函数的局限性.假设在C ++中，您编写类似以下内容:

I think this question highlights (somewhat) the limitations of calling out to precompiled functions from a higher level language. Suppose in C++ you write something like:

for (int i = 0; i != N; ++i) a[i] = b[i] + c[i] + 2 * d[i];

编译器会在编译时看到所有这些内容，即整个表达式.它可以在这里做很多非常聪明的事情，包括优化临时文件(以及循环展开).

The compiler sees all this at compile time, the whole expression. It can do a lot of really intelligent things here, including optimizing out temporaries (and loop unrolling).

但是在python中，请考虑发生了什么:当您使用numpy时，每个``+"都会在np数组类型上使用运算符重载(它们只是连续内存块的薄包装，即低级数组)，并调用一个fortran(或C ++)函数，该函数可以非常快速地执行添加操作.但它只是做一个加法，并吐出一个临时值.

In python however, consider what's happening: when you use numpy each ''+'' uses operator overloading on the np array types (which are just thin wrappers around contiguous blocks of memory, i.e. arrays in the low level sense), and calls out to a fortran (or C++) function which does the addition super fast. But it just does one addition, and spits out a temporary.

我们可以看到，虽然numpy很棒，方便且相当快，但它却使速度变慢，因为尽管看起来它正在调用一种快速编译的语言来进行艰苦的工作，但编译器却没有得到要查看整个程序，它只是馈入了孤立的一点点.这对编译器非常不利，特别是现代的编译器，它们非常聪明，当编写良好的代码时，每个周期可以退出多个指令.

We can see that in some way, while numpy is awesome and convenient and pretty fast, it is slowing things down because while it seems like it is calling into a fast compiled language for the hard work, the compiler doesn't get to see the whole program, it's just fed isolated little bits. And this is hugely detrimental to a compiler, especially modern compilers which are very intelligent and can retire multiple instructions per cycle when the code is well written.

Numba使用了jit.因此，在运行时，它可以确定不需要临时工，并对其进行优化.基本上，Numba可以将程序作为一个整体进行编译，numpy只能调用本身已预先编译的小原子块.

Numba on the other hand, used a jit. So, at runtime it can figure out that the temporaries are not needed, and optimize them away. Basically, Numba has a chance to have the program compiled as a whole, numpy can only call small atomic blocks which themselves have been pre-compiled.

这篇关于为什么numba比numpy快?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

为什么numba比numpy快? [英] Why is numba faster than numpy here?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

为什么numba比numpy快? [英] Why is numba faster than numpy here?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭