具有numpy数组和共享内存的并行python循环 [英] Parallelise python loop with numpy arrays and shared-memory

查看:71
本文介绍了具有numpy数组和共享内存的并行python循环的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道有关此主题的几个问题和答案,但是尚未找到针对该特定问题的令人满意的答案:

I am aware of several questions and answers on this topic, but haven't found a satisfactory answer to this particular problem:

对python循环进行简单的共享内存并行化的最简单方法是什么?在python循环中,通过numpy/scipy函数操作numpy数组?

What is the easiest way to do a simple shared-memory parallelisation of a python loop where numpy arrays are manipulated through numpy/scipy functions?

我并不是在寻找最有效的方法,我只是想实现一些简单的实现,并且在循环不并行运行时不需要大量重写.就像OpenMP以较低级别的语言实现一样.

I am not looking for the most efficient way, I just wanted something simple to implement that doesn't require a significant rewrite when the loop is not run in parallel. Just like OpenMP implements in lower level languages.

在这方面,我看到的最佳答案是,但这是一种笨拙的方法,需要一个将循环表达为一个带有单个参数的函数,几行共享数组转换crud,似乎要求从__main__调用并行函数,并且在交互式提示符下似乎运行不佳(我在这里度过了很多时间).

The best answer I've seen in this regard is this one, but this is a rather clunky way that requires one to express the loop into a function that takes a single argument, several lines of shared-array converting crud, seems to require that the parallel function is called from __main__, and it doesn't seem to work well from the interactive prompt (where I spend a lot of my time).

借助Python的所有简单性,这真的是使循环并行化的最佳方法吗?真的吗?以OpenMP方式并行化这很简单.

With all of Python's simplicity is this really the best way to parellelise a loop? Really? This is something trivial to parallelise in OpenMP fashion.

我辛苦地阅读了多处理模块的不透明文档,却发现它是如此通用,以至于它似乎适用于除简单循环并行化之外的所有内容.我对设置管理器,代理,管道等不感兴趣.我只有一个简单的循环,完全并行,在任务之间没有任何通信.使用MPI并行化这种简单情况似乎有点过分,更不用说在这种情况下这将导致内存效率低下.

I have painstakingly read through the opaque documentation of the multiprocessing module, only to find out that it is so general that it seems suited to everything but a simple loop parallelisation. I am not interested in setting up Managers, Proxies, Pipes, etc. I just have a simple loop, fully parallel that doesn't have any communication between tasks. Using MPI to parallelise such a simple situation seems like overkill, not to mention it would be memory-inefficient in this case.

我还没有时间去学习用于Python的多种不同的共享内存并行程序包,但是我想知道是否有人对此有更多的经验并且可以向我展示一种更简单的方法.请不要建议使用Cython等串行优化技术(我已经使用过),也不要使用诸如BLAS的并行numpy/scipy函数(我的情况更普遍,更并行).

I haven't had time to learn about the multitude of different shared-memory parallel packages for Python, but was wondering if someone has more experience in this and can show me a simpler way. Please do not suggest serial optimisation techniques such as Cython (I already use it), or using parallel numpy/scipy functions such as BLAS (my case is more general, and more parallel).

推荐答案

具有Cython并行支持:

With Cython parallel support:

# asd.pyx
from cython.parallel cimport prange

import numpy as np

def foo():
    cdef int i, j, n

    x = np.zeros((200, 2000), float)

    n = x.shape[0]
    for i in prange(n, nogil=True):
        with gil:
            for j in range(100):
                x[i,:] = np.cos(x[i,:])

    return x

在2核计算机上:

$ cython asd.pyx
$ gcc -fPIC -fopenmp -shared -o asd.so asd.c -I/usr/include/python2.7
$ export OMP_NUM_THREADS=1
$ time python -c 'import asd; asd.foo()'
real    0m1.548s
user    0m1.442s
sys 0m0.061s

$ export OMP_NUM_THREADS=2
$ time python -c 'import asd; asd.foo()'
real    0m0.602s
user    0m0.826s
sys 0m0.075s

这可以并行运行,因为np.cos(与其他ufuncs一样)释放了GIL.

This runs fine in parallel, since np.cos (like other ufuncs) releases the GIL.

如果要交互使用此功能:

If you want to use this interactively:

# asd.pyxbdl
def make_ext(modname, pyxfilename):
    from distutils.extension import Extension
    return Extension(name=modname,
                     sources=[pyxfilename],
                     extra_link_args=['-fopenmp'],
                     extra_compile_args=['-fopenmp'])

和(先删除asd.soasd.c):

>>> import pyximport
>>> pyximport.install(reload_support=True)
>>> import asd
>>> q1 = asd.foo()
# Go to an editor and change asd.pyx
>>> reload(asd)
>>> q2 = asd.foo()

是的,在某些情况下,您可以仅使用线程来并行化. OpenMP只是线程的一个高级包装器,因此,此处只需要Cython即可获得更简单的语法.如果没有Cython,则可以使用threading模块---与多处理类似(并且可能更健壮),但是您无需执行任何特殊操作即可将数组声明为共享内存.

So yes, in some cases you can parallelize just by using threads. OpenMP is just a fancy wrapper for threading, and Cython is therefore only needed here for the easier syntax. Without Cython, you can use the threading module --- works similarly as multiprocessing (and probably more robustly), but you don't need to do anything special to declare arrays as shared memory.

但是,并非所有操作都会释放GIL,因此YMMV会提高性能.

However, not all operations release the GIL, so YMMV for the performance.

***

从其他Stackoverflow答案中刮取的另一个可能有用的链接---另一个指向多处理的接口: http://packages.python .org/joblib/parallel.html

And another possibly useful link scraped from other Stackoverflow answers --- another interface to multiprocessing: http://packages.python.org/joblib/parallel.html

这篇关于具有numpy数组和共享内存的并行python循环的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆