Python中的并行性 [英] Parallelism in Python
问题描述
在Python中实现并行性有哪些选择?我想对一些非常大的栅格执行一堆CPU限制的计算,并希望将它们并行化.来自C的背景,我熟悉三种并行处理方法:
What are the options for achieving parallelism in Python? I want to perform a bunch of CPU bound calculations over some very large rasters, and would like to parallelise them. Coming from a C background, I am familiar with three approaches to parallelism:
- 消息传递过程,可能分布在整个集群中,例如 MPI .
- 使用 pthreads 或 fork(), pipe()等显式共享内存并行性.
- 使用 OpenMP 隐式共享内存并行性.
- Message passing processes, possibly distributed across a cluster, e.g. MPI.
- Explicit shared memory parallelism, either using pthreads or fork(), pipe(), et. al
- Implicit shared memory parallelism, using OpenMP.
决定使用方法是一种权衡的做法.
Deciding on an approach to use is an exercise in trade-offs.
在Python中,有哪些可用的方法,它们的特点是什么?是否存在可群集的 MPI 克隆?实现共享内存并行性的首选方式是什么?我听说过有关 GIL 的问题以及有关 tasklet 的问题.
In Python, what approaches are available and what are their characteristics? Is there a clusterable MPI clone? What are the preferred ways of achieving shared memory parallelism? I have heard reference to problems with the GIL, as well as references to tasklets.
简而言之,在它们之间进行选择之前,我需要了解有关Python中不同的并行化策略的什么信息?
In short, what do I need to know about the different parallelization strategies in Python before choosing between them?
推荐答案
通常,您描述一个CPU绑定计算.这不是Python的优势.从历史上看,两者都不是多处理.
Generally, you describe a CPU bound calculation. This is not Python's forte. Neither, historically, is multiprocessing.
主流的Python解释器中的线程已由可怕的全局锁支配.新的 multiprocessing API可以解决此问题,并提供带有管道和队列等的工作池抽象
Threading in the mainstream Python interpreter has been ruled by a dreaded global lock. The new multiprocessing API works around that and gives a worker pool abstraction with pipes and queues and such.
You can write your performance critical code in C or Cython, and use Python for the glue.
这篇关于Python中的并行性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!