使用 multiprocessing 模块进行集群计算 [英] Using the multiprocessing module for cluster computing

查看:41
本文介绍了使用 multiprocessing 模块进行集群计算的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对使用计算机集群运行 Python 程序感兴趣.我过去一直使用 Python MPI 接口,但由于编译/安装这些接口的困难,我更喜欢使用内置模块的解决方案,例如 Python 的 multiprocessing 模块.

I'm interested in running a Python program using a computer cluster. I have in the past been using Python MPI interfaces, but due to difficulties in compiling/installing these, I would prefer solutions which use built-in modules, such as Python's multiprocessing module.

我真正想做的只是设置一个跨越整个计算机集群的 multiprocessing.Pool 实例,然后运行一个 Pool.map(...).这是可能/容易做的事情吗?

What I would really like to do is just set up a multiprocessing.Pool instance that would span across the whole computer cluster, and run a Pool.map(...). Is this something that is possible/easy to do?

如果这是不可能的,我希望至少能够从中央脚本的任何节点上启动 Process 实例,每个节点都有不同的参数.

If this is impossible, I'd like to at least be able to start Process instances on any of the nodes from a central script with different parameters for each node.

推荐答案

如果集群计算是指分布式内存系统(多节点而不是 SMP),那么 Python 的多处理可能不是一个合适的选择.它可以产生多个进程,但它们仍然会被绑定在一个节点内.

If by cluster computing you mean distributed memory systems (multiple nodes rather that SMP) then Python's multiprocessing may not be a suitable choice. It can spawn multiple processes but they will still be bound within a single node.

您需要一个框架来处理跨多个节点的进程生成并提供处理器之间的通信机制.(几乎就是 MPI 所做的).

What you will need is a framework that handles spawing of processes across multiple nodes and provides a mechanism for communication between the processors. (pretty much what MPI does).

请参阅 Python wiki 上的并行处理页面 以获取有帮助的框架列表与集群计算.

See the page on Parallel Processing on the Python wiki for a list of frameworks which will help with cluster computing.

从列表中,pppyrocelery 看起来是明智的选择,虽然我不能亲自担保任何一个,因为我没有任何经验(我主要使用 MPI).

From the list, pp, jug, pyro and celery look like sensible options although I can't personally vouch for any since I have no experience with any of them (I use mainly MPI).

如果易于安装/使用很重要,我会从探索 jug 开始.易于安装支持普通批处理集群系统,看起来有据可查.

If ease of installation/use is important, I would start by exploring jug. It's easy to install, supports common batch cluster systems, and looks well documented.

这篇关于使用 multiprocessing 模块进行集群计算的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆