导入 scipy 会破坏 Python 中的多处理支持 [英] Importing scipy breaks multiprocessing support in Python

查看:37
本文介绍了导入 scipy 会破坏 Python 中的多处理支持的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我遇到了一个无法解释的奇怪问题.我希望有人可以帮忙!

I am running into a bizarre problem that I can't explain. I'm hoping someone out there can help please!

我正在运行 Python 2.7.3 和 Scipy v0.14.0,并且正在尝试实现一些非常简单的多处理器算法,以使用 multiprocessing 模块来加速我的代码.我已经设法使一个基本的示例工作:

I'm running Python 2.7.3 and Scipy v0.14.0 and am trying to implement some very simple multiprocessor algorithms to speeds up my code using the module multiprocessing. I've managed to make a basic example work:

import multiprocessing
import numpy as np
import time
# import scipy.special


def compute_something(t):
    a = 0.
    for i in range(100000):
        a = np.sqrt(t)
    return a

if __name__ == '__main__':

    pool_size = multiprocessing.cpu_count()
    print "Pool size:", pool_size
    pool = multiprocessing.Pool(processes=pool_size)

    inputs = range(10)

    tic = time.time()
    builtin_outputs = map(compute_something, inputs)
    print 'Built-in:', time.time() - tic

    tic = time.time()
    pool_outputs = pool.map(compute_something, inputs)
    print 'Pool    :', time.time() - tic

运行良好,返回

Pool size: 8
Built-in: 1.56904006004
Pool    : 0.447728157043

但是如果我取消注释 import scipy.special 行,我会得到:

But if I uncomment the line import scipy.special, I get:

Pool size: 8
Built-in: 1.58968091011
Pool    : 1.59387993813

而且我可以看到只有一个内核在我的系统上工作.其实从scipy包中导入任何模块好像都有这个效果(我试过好几个了).

and I can see that only one core is doing the work on my system. In fact, importing any module from the scipy package seems to have this effect (I've tried several).

有什么想法吗?我以前从未见过这样的案例,一个看似无害的导入会产生如此奇怪和意想不到的效果.

Any ideas? I've never seen a case like this before, where an apparently innocuous import can have such a strange and unexpected effect.

谢谢!

更新 (1)

将 scipy 导入行移至函数 compute_something 部分改善了问题:

Moving the scipy import line to the function compute_something partially improves the problem:

Pool size: 8
Built-in: 1.66807389259
Pool    : 0.596321105957

更新(2)

感谢@larsmans 在不同的系统上进行测试.使用 Scipy v.0.12.0 未确认问题.将此查询移至 scipy 邮件列表并将发布任何答案.

Thanks to @larsmans for testing on a different system. Problem was not confirmed using Scipy v.0.12.0. Moving this query to the scipy mailing list and will post any answers.

推荐答案

经过大量挖掘并发布问题 在 Scipy GitHub 站点上,我找到了解决方案.

After much digging around and posting an issue on the Scipy GitHub site, I've found a solution.

在我开始之前,这被很好地记录在这里- 我只是概述一下.

Before I start, this is documented very well here - I'll just give an overview.

这个问题与我使用的 Scipy 或 Numpy 版本无关.它起源于 Numpy 和 Scipy 用于各种线性代数例程的系统 BLAS 库.您可以通过运行来判断 Numpy 链接到哪些库

This problem is not related to the version of Scipy, or Numpy that I was using. It originates in the system BLAS libraries that Numpy and Scipy use for various linear algebra routines. You can tell which libraries Numpy is linked to by running

python -c 'import numpy;numpy.show_config()'

如果您在 Linux 中使用 OpenBLAS,您可能会发现 CPU 亲和性设置为 1,这意味着一旦将这些算法导入 Python(通过 Numpy/Scipy),您最多可以访问 CPU 的一个核心.要对此进行测试,请在 Python 终端中运行

If you are using OpenBLAS in Linux, you may find that the CPU affinity is set to 1, meaning that once these algorithms are imported in Python (via Numpy/Scipy), you can access at most one core of the CPU. To test this, within a Python terminal run

import os
os.system('taskset -p %s' %os.getpid())

如果 CPU 亲和性返回为 f, of ff,则可以访问多个内核.在我的情况下,它会这样开始,但是在导入 numpy 或 scipy.any_module 时,它​​会切换到 1,因此是我的问题.

If the CPU affinity is returned as f, of ff, you can access multiple cores. In my case it would start like that, but upon importing numpy or scipy.any_module, it would switch to 1, hence my problem.

我找到了两个解决方案:

I've found two solutions:

更改 CPU 亲和性

可以在main函数顶部手动设置master进程的CPU亲和性,代码如下所示:

You can manually set the CPU affinity of the master process at the top of the main function so that the code looks like this:

import multiprocessing
import numpy as np
import math
import time
import os

def compute_something(t):
    a = 0.
    for i in range(10000000):
        a = math.sqrt(t)
    return a

if __name__ == '__main__':

    pool_size = multiprocessing.cpu_count()
    os.system('taskset -cp 0-%d %s' % (pool_size, os.getpid()))

    print "Pool size:", pool_size
    pool = multiprocessing.Pool(processes=pool_size)

    inputs = range(10)

    tic = time.time()
    builtin_outputs = map(compute_something, inputs)
    print 'Built-in:', time.time() - tic

    tic = time.time()
    pool_outputs = pool.map(compute_something, inputs)
    print 'Pool    :', time.time() - tic

请注意,为 taskset 选择高于内核数的值似乎无关紧要 - 它只是使用可能的最大数量.

Note that selecting a value higher than the number of cores for taskset doesn't seem to matter - it just uses the maximum possible number.

切换 BLAS 库

解决方案记录在上面链接的站点中.基本上:安装 libatlas 并运行 update-alternatives 将 numpy 指向 ATLAS 而不是 OpenBLAS.

Solution documented at the site linked above. Basically: install libatlas and run update-alternatives to point numpy to ATLAS rather than OpenBLAS.

这篇关于导入 scipy 会破坏 Python 中的多处理支持的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆