同时运行的多个 Python 实例限制为 35 个 [英] Multiple instances of Python running simultaneously limited to 35

查看:16
本文介绍了同时运行的多个 Python 实例限制为 35 个的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在并行计算集群的不同处理器上将 Python 3.6 脚本作为多个独立进程运行.多达 35 个进程同时运行没有问题,但第 36 个(以及更多)在第二行的分段错误(import pandas as pd)崩溃.有趣的是,第一行 import os 不会引起问题.完整的错误信息是:

I am running a Python 3.6 script as multiple separate processes on different processors of a parallel computing cluster. Up to 35 processes run simultaneously with no problem, but the 36th (and any more) crashes with a segmentation fault on the second line which is import pandas as pd. Interestingly, the first line import os does not cause an issue. The full error message is:

OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1024 current, 2067021 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1024 current, 2067021 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1024 current, 2067021 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1024 current, 2067021 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1024 current, 2067021 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1024 current, 2067021 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1024 current, 2067021 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1024 current, 2067021 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1024 current, 2067021 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1024 current, 2067021 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1024 current, 2067021 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1024 current, 2067021 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1024 current, 2067021 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1024 current, 2067021 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1024 current, 2067021 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1024 current, 2067021 max
Traceback (most recent call last):
  File "/home/.../myscript.py", line 32, in <module>
    import pandas as pd
  File "/home/.../python_venv2/lib/python3.6/site-packages/pandas/__init__.py", line 13, in <module>
    __import__(dependency)
  File "/home/.../python_venv2/lib/python3.6/site-packages/numpy/__init__.py", line 142, in <module>
    from . import add_newdocs
  File "/home/.../python_venv2/lib/python3.6/site-packages/numpy/add_newdocs.py", line 13, in <module>
    from numpy.lib import add_newdoc
  File "/home/.../python_venv2/lib/python3.6/site-packages/numpy/lib/__init__.py", line 8, in <module>
    from .type_check import *
  File "/home/.../python_venv2/lib/python3.6/site-packages/numpy/lib/type_check.py", line 11, in <module>
    import numpy.core.numeric as _nx
  File "/home/.../python_venv2/lib/python3.6/site-packages/numpy/core/__init__.py", line 16, in <module>
    from . import multiarray
SystemError: initialization of multiarray raised unreported exception
/var/spool/slurmd/job04590/slurm_script: line 11: 26963 Segmentation fault      python /home/.../myscript.py -x 38

Pandas 和其他一些软件包安装在虚拟环境中.我已经复制了虚拟环境,因此每个 venv 中运行的进程不超过 24 个.例如,上面的错误脚本来自一个在名为 python_venv2 的虚拟环境中运行的脚本.

Pandas and a few other packages are installed in a virtual environment. I have duplicated the virtual environment, so that there are no more than 24 processes running in each venv. For example, the error script above came from a script running in the virtual environment called python_venv2.

无论有多少进程从 Pandas 的特定实例导入,问题每次都出现在第 36 个进程上.(我什至没有降低并行计算集群的容量.)

The problem occurs on the 36th process every time regardless of how many of the processes are importing from the particular instance of Pandas. (I am not even making a dent in the capacity of the parallel computing cluster.)

那么,如果不是限制访问Pandas的进程数,是不是限制了运行Python的进程数?为什么 35 是限制?

So, if it is not a restriction on the number of processes accessing Pandas, is it a restriction on the number of processes running Python? Why is 35 the limit?

是否可以在机器上安装多个 Python 副本(在单独的虚拟环境中?),以便我可以运行 35 个以上的进程?

Is it possible to install multiple copies of Python on the machine (in separate virtual environments?) so that I can run more than 35 processes?

推荐答案

分解错误信息

您的错误消息包含以下提示:

Your error message includes the following hint:

OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1024 current, 2067021 max

RLIMIT_NPROC 变量控制用户可以拥有的进程总数.更具体地说,因为它是每个进程的设置,当 fork(), clone(), vfork(), &c 被调用时对于一个进程,该进程的 RLIMIT_NPROC 值与该进程的父用户的总进程数进行比较.如果超过该值,事情就会停止,正如您所经历的那样.

The RLIMIT_NPROC variable controls the total number of processes that user can have. More specifically, as it is a per process setting, when fork(), clone(), vfork(), &c are called by a process, the RLIMIT_NPROC value for that process is compared to the total process count for that process's parent user. If that value is exceeded, things shut down, as you've experienced.

该错误消息表明 OpenBLAS 无法创建其他线程,因为您的用户已使用 RLIMIT_NPROC 提供的所有线程.

The error message indicates that OpenBLAS was unable to create additional threads because your user had used all the threads RLIMIT_NPROC had given it.

由于您在集群上运行,因此您的用户不太可能运行多个线程(这与您在个人计算机上浏览网页、播放音乐等情况不同),因此合理得出结论,OpenBLAS 正在尝试启动多个线程.

Since you're running on a cluster, it's unlikely that your user is running many threads (unlike, say, if you were on your personal machine and browsing the web, playing music, &c), so it's reasonable to conclude that OpenBLAS is trying to start multiple threads.

OpenBLAS 如何使用线程

OpenBLAS 可以使用多个线程来加速线性代数.您可能需要多个线程来快速解决单个更大的问题.您可能需要更少的线程来同时解决许多较小的问题.

OpenBLAS can use multiple threads to accelerate linear algebra. You may want many threads for solving a single, larger problem quickly. You may want fewer threads for solving many smaller problems simultaneously.

OpenBLAS 有多种方法来限制它使用的线程数.这些是通过以下方式控制的:

OpenBLAS has several ways to limit the number of threads it uses. These are controlled via:

export OPENBLAS_NUM_THREADS=4
export GOTO_NUM_THREADS=4
export OMP_NUM_THREADS=4

优先级是 OPENBLAS_NUM_THREADS > GOTO_NUM_THREADS > OMP_NUM_THREADS.(我认为这意味着 OPENBLAS_NUM_THREADS 覆盖 OMP_NUM_THREADS;但是,OpenBLAS 忽略 OPENBLAS_NUM_THREADSGOTO_NUM_THREADScode> 使用 USE_OPENMP=1 编译时.)

The priorities are OPENBLAS_NUM_THREADS > GOTO_NUM_THREADS > OMP_NUM_THREADS. (I think this means that OPENBLAS_NUM_THREADS overrides OMP_NUM_THREADS; however, OpenBLAS ignores OPENBLAS_NUM_THREADS and GOTO_NUM_THREADS when compiled with USE_OPENMP=1.)

如果上述变量均未设置,OpenBLAS 将使用等于您机器上的内核数(您的机器上为 32)的线程数运行

If none of the foregoing variables are set, OpenBLAS will run using a number of threads equal to the number of cores on your machine (32 on your machine)

您的情况

您的集群有 32 核 CPU.您正在尝试运行 36 个 Python 实例.每个实例需要 1 个 Python 线程 + 32 个 OpenBLAS 线程.您的 SSH 连接还需要 1 个线程,shell 需要 1 个线程.这意味着您需要 36*(32+1)+2=1190 个线程.

Your cluster has 32-core CPUs. You're trying to run 36 instances of Python. Each instance requires 1 thread for Python + 32 threads for OpenBLAS. You'll also need 1 thread for your SSH connection and 1 thread for your shell. That means that you need 36*(32+1)+2=1190 threads.

解决问题的核心选项是使用:

The nuclear option for fixing the problem is to use:

export OPENBLAS_NUM_THREADS=1

这将使您减少到 36*(1+1)+2=74 个线程.

which should bring you down to 36*(1+1)+2=74 threads.

由于您有空闲容量,您可以将 OPENBLAS_NUM_THREADS 调整为更高的值,但是您的单独 Python 进程拥有的 OpenBLAS 实例会相互干扰.因此,在获得一个解决方案的速度与获得多个解决方案的速度之间存在权衡.理想情况下,您可以通过为每个节点运行更少的 Python 并使用更多的节点来解决这种权衡.

Since you have spare capacity, you could adjust OPENBLAS_NUM_THREADS to a higher value, but then the OpenBLAS instances owned by your separate Python processes will interfere with each other. So there's a trade-off between how fast you get one solution versus how fast you can get many solutions. Ideally, you can solve this trade-off by running fewer Pythons per node and using more nodes.

这篇关于同时运行的多个 Python 实例限制为 35 个的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆