如何通过并行化的Python代码在集群上使用多个节点/核心 [英] How to use multiple nodes/cores on a cluster with parellelized Python code

查看:500
本文介绍了如何通过并行化的Python代码在集群上使用多个节点/核心的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一段Python代码,我在其中使用 joblib

I have a piece of Python code where I use joblib and multiprocessing to make parts of the code run in parallel. I have no trouble running this on my desktop where I can use Task Manager to see that it uses all four cores and runs the code in parallel.

我最近了解到,我可以访问具有100+ 20个核心节点的HPC集群.集群使用 SLURM 作为工作负载管理器.

I recently learnt that I have access to a HPC cluster with 100+ 20 core nodes. The cluster uses SLURM as the workload manager.

第一个问题是:是否可以在集群上运行并行化的Python代码?

The first question is: Is it possible to run parallelized Python code on a cluster?

如果可能,

  1. 是否需要更改我要在集群上运行的Python代码,并且

  1. Does the Python code I have need to be changed at all to run on the cluster, and

需要在作业提交文件中放入什么#SBATCH指令以告知代码的并行化部分应在四个内核(或四个节点)上运行?

What #SBATCH instructions need to be put in the job submission file to tell it that the parallelized parts of the code should run on four cores (or is it four nodes)?

我可以访问的群集具有以下属性:

The cluster I have access to has the following attributes:

PARTITION      CPUS(A/I/O/T)       NODES(A/I)  TIMELIMIT      MEMORY  CPUS  SOCKETS CORES 
standard       324/556/16/896      34/60       5-00:20:00     46000+  8+    2       4+

推荐答案

通常认为 MPI 高性能计算的事实标准.有一些用于Python的MPI绑定:

Typically MPI is considered the de-facto standard for High-Performance Computing. There are a few MPI bindings for Python:

  • MPI for Python
  • pyMPI
  • Boost.MPI has Python bindings.

还有很多框架-列表

您的代码至少需要进行最小的更改,但不应太多.

Your code will require at least minimal changes, but they shouldn't be too much.

当您移植到MPI时,您可以在每个内核上运行一个进程,而无需使用multiprocessing

When you port to MPI you can run a single process per core and you will not need to use multiprocessing

例如,如果您有100个节点,每个节点有24个内核,则将运行2400个Python进程.

So, for example, if you have 100 nodes with 24 cores each, you will run 2400 Python processes.

这篇关于如何通过并行化的Python代码在集群上使用多个节点/核心的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆