为什么导入 numpy 后多处理只使用一个核心? [英] Why does multiprocessing use only a single core after I import numpy?
问题描述
我不确定这是否更像是一个操作系统问题,但我想我会在这里问一下,以防有人对 Python 端有一些见解.
I am not sure whether this counts more as an OS issue, but I thought I would ask here in case anyone has some insight from the Python end of things.
我一直在尝试使用 joblib
并行化 CPU 密集型 for
循环,但我发现不是将每个工作进程分配给不同的内核,而是最终所有这些都被分配到同一个核心,并且没有性能提升.
I've been trying to parallelise a CPU-heavy for
loop using joblib
, but I find that instead of each worker process being assigned to a different core, I end up with all of them being assigned to the same core and no performance gain.
这是一个非常简单的例子...
Here's a very trivial example...
from joblib import Parallel,delayed
import numpy as np
def testfunc(data):
# some very boneheaded CPU work
for nn in xrange(1000):
for ii in data[0,:]:
for jj in data[1,:]:
ii*jj
def run(niter=10):
data = (np.random.randn(2,100) for ii in xrange(niter))
pool = Parallel(n_jobs=-1,verbose=1,pre_dispatch='all')
results = pool(delayed(testfunc)(dd) for dd in data)
if __name__ == '__main__':
run()
...这是我在脚本运行时在 htop
中看到的:
...and here's what I see in htop
while this script is running:
我在一台 4 核的笔记本电脑上运行 Ubuntu 12.10 (3.5.0-26).很明显 joblib.Parallel
正在为不同的 worker 生成单独的进程,但是有什么方法可以让这些进程在不同的内核上执行?
I'm running Ubuntu 12.10 (3.5.0-26) on a laptop with 4 cores. Clearly joblib.Parallel
is spawning separate processes for the different workers, but is there any way that I can make these processes execute on different cores?
推荐答案
经过更多的谷歌搜索后,我找到了答案here.
After some more googling I found the answer here.
事实证明某些 Python 模块(numpy
、scipy
、tables
、pandas
、skimage
...) 在导入时弄乱了核心亲和力.据我所知,这个问题似乎是由它们链接到多线程 OpenBLAS 库引起的.
It turns out that certain Python modules (numpy
, scipy
, tables
, pandas
, skimage
...) mess with core affinity on import. As far as I can tell, this problem seems to be specifically caused by them linking against multithreaded OpenBLAS libraries.
一种解决方法是使用
os.system("taskset -p 0xff %d" % os.getpid())
在模块导入后粘贴这一行,我的示例现在可以在所有内核上运行:
With this line pasted in after the module imports, my example now runs on all cores:
到目前为止,我的经验是这似乎不会对 numpy
的性能产生任何负面影响,尽管这可能是特定于机器和任务的.
My experience so far has been that this doesn't seem to have any negative effect on numpy
's performance, although this is probably machine- and task-specific .
还有两种方法可以禁用 OpenBLAS 本身的 CPU 关联性重置行为.在运行时,您可以使用环境变量 OPENBLAS_MAIN_FREE
(或 GOTOBLAS_MAIN_FREE
),例如
There are also two ways to disable the CPU affinity-resetting behaviour of OpenBLAS itself. At run-time you can use the environment variable OPENBLAS_MAIN_FREE
(or GOTOBLAS_MAIN_FREE
), for example
OPENBLAS_MAIN_FREE=1 python myscript.py
或者,如果您从源代码编译 OpenBLAS,您可以在构建时通过编辑 Makefile.rule
以包含该行来永久禁用它
Or alternatively, if you're compiling OpenBLAS from source you can permanently disable it at build-time by editing the Makefile.rule
to contain the line
NO_AFFINITY=1
这篇关于为什么导入 numpy 后多处理只使用一个核心?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!