为什么导入 numpy 后多处理只使用一个核心? [英] Why does multiprocessing use only a single core after I import numpy?

查看:48
本文介绍了为什么导入 numpy 后多处理只使用一个核心?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我不确定这是否更像是一个操作系统问题,但我想我会在这里问一下,以防有人对 Python 端有一些见解.

I am not sure whether this counts more as an OS issue, but I thought I would ask here in case anyone has some insight from the Python end of things.

我一直在尝试使用 joblib 并行化 CPU 密集型 for 循环,但我发现不是将每个工作进程分配给不同的内核,而是最终所有这些都被分配到同一个核心,并且没有性能提升.

I've been trying to parallelise a CPU-heavy for loop using joblib, but I find that instead of each worker process being assigned to a different core, I end up with all of them being assigned to the same core and no performance gain.

这是一个非常简单的例子...

Here's a very trivial example...

from joblib import Parallel,delayed
import numpy as np

def testfunc(data):
    # some very boneheaded CPU work
    for nn in xrange(1000):
        for ii in data[0,:]:
            for jj in data[1,:]:
                ii*jj

def run(niter=10):
    data = (np.random.randn(2,100) for ii in xrange(niter))
    pool = Parallel(n_jobs=-1,verbose=1,pre_dispatch='all')
    results = pool(delayed(testfunc)(dd) for dd in data)

if __name__ == '__main__':
    run()

...这是我在脚本运行时在 htop 中看到的:

...and here's what I see in htop while this script is running:

我在一台 4 核的笔记本电脑上运行 Ubuntu 12.10 (3.5.0-26).很明显 joblib.Parallel 正在为不同的 worker 生成单独的进程,但是有什么方法可以让这些进程在不同的内核上执行?

I'm running Ubuntu 12.10 (3.5.0-26) on a laptop with 4 cores. Clearly joblib.Parallel is spawning separate processes for the different workers, but is there any way that I can make these processes execute on different cores?

推荐答案

经过更多的谷歌搜索后,我找到了答案here.

After some more googling I found the answer here.

事实证明某些 Python 模块(numpyscipytablespandasskimage...) 在导入时弄乱了核心亲和力.据我所知,这个问题似乎是由它们链接到多线程 OpenBLAS 库引起的.

It turns out that certain Python modules (numpy, scipy, tables, pandas, skimage...) mess with core affinity on import. As far as I can tell, this problem seems to be specifically caused by them linking against multithreaded OpenBLAS libraries.

一种解决方法是使用

os.system("taskset -p 0xff %d" % os.getpid())

在模块导入后粘贴这一行,我的示例现在可以在所有内核上运行:

With this line pasted in after the module imports, my example now runs on all cores:

​​

到目前为止,我的经验是这似乎不会对 numpy 的性能产生任何负面影响,尽管这可能是特定于机器和任务的.

My experience so far has been that this doesn't seem to have any negative effect on numpy's performance, although this is probably machine- and task-specific .

还有两种方法可以禁用 OpenBLAS 本身的 CPU 关联性重置行为.在运行时,您可以使用环境变量 OPENBLAS_MAIN_FREE(或 GOTOBLAS_MAIN_FREE),例如

There are also two ways to disable the CPU affinity-resetting behaviour of OpenBLAS itself. At run-time you can use the environment variable OPENBLAS_MAIN_FREE (or GOTOBLAS_MAIN_FREE), for example

OPENBLAS_MAIN_FREE=1 python myscript.py

或者,如果您从源代码编译 OpenBLAS,您可以在构建时通过编辑 Makefile.rule 以包含该行来永久禁用它

Or alternatively, if you're compiling OpenBLAS from source you can permanently disable it at build-time by editing the Makefile.rule to contain the line

NO_AFFINITY=1

这篇关于为什么导入 numpy 后多处理只使用一个核心?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆