通过Numexpr了解Numpy中的矢量化与矢量化表达式的多线程之间的区别 [英] Understanding the difference between vectorizing in Numpy and multi-threading of vectorized expression via Numexpr

查看:141
本文介绍了通过Numexpr了解Numpy中的矢量化与矢量化表达式的多线程之间的区别的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在说NumPy正在向量化其算术数组操作这一概念上有些挣扎:由于NumPy的一部分是用C实现的,它是否克服了Python的GIL?另外,Numexpr然后如何工作?如果我理解正确,它将通过优化的JIT运行代码并启用多线程,从而克服了Python的GIL.

I am struggling a little bit with the concept that NumPy is said to be vectorizing its arithmetic array operations: Does it overcome Python's GIL since part of NumPy is implemented in C? Also, how does Numexpr work then? If I understand correctly, it runs code through an optimizing JIT and enables multi-threading and thereby overcomes Python's GIL.

真正的"矢量化不是更像是多进程而不是多线程吗?

And isn't "true" vectorization more like multi-processesing instead of multithreading?

推荐答案

在某些情况下,NumPy可能会使用使用多个进程进行处理的库,从而将负担分散到几个内核上.但是,这取决于库,与NumPy中的python代码无关.因此,是的,如果不是用python编写的,NumPy和其他任何库都可以克服这些限制.甚至有一些库提供GPU加速功能.

NumPy may in some cases use a library which uses multiple processes to do the processing and thus spreads the burden onto several cores. This, however, depends on the library and does not have much to do with the python code in NumPy. So, yes, NumPy and any other library may overcome these restrictions if it is not written in python. There are even some libraries offering GPU accelerated functions.

NumExpr使用相同的方法绕过GIL.在他们的主页上:

NumExpr uses the same method to bypass GIL. From their homepage:

此外,numexpr实现了对直接使用C语言编写的内部虚拟机中的多线程计算的支持.这允许绕过Python中的GIL

但是,NumPy和NumExpr之间存在一些根本差异. Numpy专注于为数组操作创建一个良好的Pythonic接口,NumExpr的范围和自己的语言要狭窄得多.当NumPy执行以操作数为数组的计算c = 3*a + 4*b时,将在进程中创建两个临时数组(3*a4*b).在这种情况下,NumExpr可能会优化计算,以便在不使用任何中间结果的情况下逐个元素地执行乘法和加法运算.

However, there are some fundamental differences between NumPy and NumExpr. Numpy is concentrated on creating a good Pythonic interface for array operations, NumExpr has a much narrower scope and its own language. When NumPy performs the calculation c = 3*a + 4*b where operands are arrays, two temporary arrays are created in the process (3*a and 4*b). In this case NumExpr may optimize the calculation so that the multiplications and addition are performed element-by-element without using any intermediate results.

这会导致NumPy发生一些有趣的事情.以下测试是使用4核8线程i7处理器进行的,并且使用iPython的%timeit进行了计时:

This leads to some interesting things with NumPy. The following tests have been carried out with a 4-core 8-thread i7 processor, and timing has been taken care with iPython's %timeit:

import numpy as np
import numexpr as ne

def addtest_np(a, b): a + b
def addtest_ne(a, b): ne.evaluate("a+b")

def addtest_np_inplace(a, b): a += b
def addtest_ne_inplace(a, b): ne.evaluate("a+b", out=a)

def addtest_np_constant(a): a + 3
def addtest_ne_constant(a): ne.evaluate("a+3")

def addtest_np_constant_inplace(a): a += 3
def addtest_ne_constant_inplace(a): ne.evaluate("a+3", out=a)

a_small = np.random.random((100,10))
b_small = np.random.random((100,10))

a_large = np.random.random((100000, 1000))
b_large = np.random.random((100000, 1000))

# results: (time given is in nanoseconds per element with small/large array)
# np: NumPy
# ne8: NumExpr with 8 threads
# ne1: NumExpr with 1 thread
#
# a+b:
#  np: 2.25 / 4.01
#  ne8: 22.6 / 3.22
#  ne1: 22.6 / 4.21
# a += b:
#  np: 1.5 / 1.26 
#  ne8: 36.8 / 1.18
#  ne1: 36.8 / 1.48

# a+3:
#  np: 4.8 / 3.62
#  ne8: 10.9 / 3.09
#  ne1: 20.2 / 4.04
# a += 3:
#  np: 3.6 / 0.79
#  ne8: 34.9 / 0.81
#  ne1: 34.4 / 1.06

当然,使用所使用的计时方法不是很准确,但是存在某些总体趋势:

Of course, with the timing methods used this is not very accurate, but there are certain general trends:

  • NumPy使用更少的cloc周期(np< ne1)
  • 并行处理对非常大的数组(10%至20%)有帮助
  • 使用小数组时,NumExpr的速度要慢得多
  • NumPy在就地操作方面非常强大

NumPy不会使简单的算术运算并行进行,但是从上面可以看出,这并不重要.速度主要受内存带宽的限制,而不是处理能力的限制.

NumPy does not make simple arithmetic operations parallel, but as can be seen from above, it does not really matter. The speed is mostly limited by the memory bandwidth, not processing power.

如果我们做一些更复杂的事情,事情就会改变.

If we do something more complicated, things change.

np.sin(a_large)               # 19.4 ns/element
ne.evaluate("sin(a_large)")   # 5.5 ns/element

速度不再受内存带宽的限制.要查看这是否真的是由于线程(而不是由于NumExpr有时使用一些快速库)引起的:

The speed is no longer limited by the memory bandwidth. To see if this really is due to threading (and not due to NumExpr sometimes using some fast libraries):

ne.set_num_threads(1)
ne.evaluate("sin(a_large)")    # 34.3 ns/element

这里的并行性确实有很大帮助.

Here parallelism helps really a lot.

NumPy可以使用具有更复杂的线性运算的并行处理,例如矩阵求逆. NumExpr不支持这些操作,因此没有有意义的比较.实际速度取决于所使用的库(BLAS/Atlas/LAPACK).同样,当执行诸如FFT的复杂操作时,性能取决于库. (AFAIK,NumPy/SciPy还没有fftw支持.)

NumPy may use parallel processing with more complicated linear operations, such as matrix inversions. These operations are not supported by NumExpr, so there is no meaningful comparison. The actual speed depends on the library used (BLAS/Atlas/LAPACK). Also, when performing complex operations such as FFT, the performance depends on the library. (AFAIK, NumPy/SciPy does not have fftw support yet.)

作为总结,似乎在某些情况下NumExpr非常快速且有用.然后在某些情况下NumPy最快.如果您有愤怒数组和逐元素运算,则NumExpr非常强大.但是,应注意,使用multiprocessing或类似的东西,某些并行性(甚至在计算机之间分散计算)通常很容易合并到代码中.

As a summary, there seem to be cases where NumExpr is very fast and useful. Then there are cases where NumPy is fastest. If you have rage arrays and elementwise operations, NumExpr is very strong. However, it should be noted that some parallelism (or even spreading the calculations across computers) is often quite easy to incorporate into the code with multiprocessing or something equivalent.

关于多处理"和多线程"的问题有点棘手,因为术语有点不确定.在python中,线程"是在同一GIL下运行的,但是如果我们谈论操作系统线程和进程,则两者之间可能没有任何区别.例如,在Linux中,两者之间没有区别.

The question about "multi-processing" and "multi-threading" is a bit tricky, as the terminology is a bit wobbly. In python "thread" is something that runs under the same GIL, but if we talk about operating system threads and processes, there may not be any difference between the two. For example, in Linux there is no difference between the two.

这篇关于通过Numexpr了解Numpy中的矢量化与矢量化表达式的多线程之间的区别的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆