同时运行的多个Python实例限制为35个 [英] Multiple instances of Python running simultaneously limited to 35

查看:208
本文介绍了同时运行的多个Python实例限制为35个的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在并行计算集群的不同处理器上将Python 3.6脚本作为多个单独的进程运行. 多达35个进程同时运行没有问题,但是第36行(及以后)因第二行import pandas as pd上的分段错误而崩溃.有趣的是,第一行import os不会引起问题. 完整的错误消息是:

I am running a Python 3.6 script as multiple separate processes on different processors of a parallel computing cluster. Up to 35 processes run simultaneously with no problem, but the 36th (and any more) crashes with a segmentation fault on the second line which is import pandas as pd. Interestingly, the first line import os does not cause an issue. The full error message is:

OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1024 current, 2067021 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1024 current, 2067021 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1024 current, 2067021 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1024 current, 2067021 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1024 current, 2067021 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1024 current, 2067021 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1024 current, 2067021 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1024 current, 2067021 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1024 current, 2067021 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1024 current, 2067021 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1024 current, 2067021 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1024 current, 2067021 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1024 current, 2067021 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1024 current, 2067021 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1024 current, 2067021 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1024 current, 2067021 max
Traceback (most recent call last):
  File "/home/.../myscript.py", line 32, in <module>
    import pandas as pd
  File "/home/.../python_venv2/lib/python3.6/site-packages/pandas/__init__.py", line 13, in <module>
    __import__(dependency)
  File "/home/.../python_venv2/lib/python3.6/site-packages/numpy/__init__.py", line 142, in <module>
    from . import add_newdocs
  File "/home/.../python_venv2/lib/python3.6/site-packages/numpy/add_newdocs.py", line 13, in <module>
    from numpy.lib import add_newdoc
  File "/home/.../python_venv2/lib/python3.6/site-packages/numpy/lib/__init__.py", line 8, in <module>
    from .type_check import *
  File "/home/.../python_venv2/lib/python3.6/site-packages/numpy/lib/type_check.py", line 11, in <module>
    import numpy.core.numeric as _nx
  File "/home/.../python_venv2/lib/python3.6/site-packages/numpy/core/__init__.py", line 16, in <module>
    from . import multiarray
SystemError: initialization of multiarray raised unreported exception
/var/spool/slurmd/job04590/slurm_script: line 11: 26963 Segmentation fault      python /home/.../myscript.py -x 38

熊猫和其他一些软件包已安装在虚拟环境中.我已经复制了虚拟环境,因此每个venv中运行的进程不超过24个.例如,上面的错误脚本来自在称为python_venv2的虚拟环境中运行的脚本.

Pandas and a few other packages are installed in a virtual environment. I have duplicated the virtual environment, so that there are no more than 24 processes running in each venv. For example, the error script above came from a script running in the virtual environment called python_venv2.

无论从特定的Pandas实例导入多少个进程,该问题每次都会在第36个进程中发生. (我什至没有削弱并行计算集群的能力.)

The problem occurs on the 36th process every time regardless of how many of the processes are importing from the particular instance of Pandas. (I am not even making a dent in the capacity of the parallel computing cluster.)

那么,如果这不限制访问Pandas的进程数量,是否还限制了运行Python的进程数量?为什么限制为35?

So, if it is not a restriction on the number of processes accessing Pandas, is it a restriction on the number of processes running Python? Why is 35 the limit?

是否可以在计算机上安装多个Python副本(在单独的虚拟环境中?),以便我可以运行35个以上的进程?

Is it possible to install multiple copies of Python on the machine (in separate virtual environments?) so that I can run more than 35 processes?

推荐答案

分解错误消息

您的错误消息包含以下提示:

Your error message includes the following hint:

OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1024 current, 2067021 max

RLIMIT_NPROC变量控制用户可以拥有的进程总数.更具体地说,由于它是每个进程的设置,因此当某个进程调用fork()clone()vfork()和& c时,会将该进程的RLIMIT_NPROC值与该进程的总进程计数进行比较进程的父用户.如果超过该值,则您将遇到的一切都将关闭.

The RLIMIT_NPROC variable controls the total number of processes that user can have. More specifically, as it is a per process setting, when fork(), clone(), vfork(), &c are called by a process, the RLIMIT_NPROC value for that process is compared to the total process count for that process's parent user. If that value is exceeded, things shut down, as you've experienced.

错误消息表明OpenBLAS无法创建其他线程,因为您的用户使用了RLIMIT_NPROC给定的所有线程.

The error message indicates that OpenBLAS was unable to create additional threads because your user had used all the threads RLIMIT_NPROC had given it.

由于您正在集群上运行,因此用户不太可能运行多个线程(例如,如果您使用个人计算机并浏览网络,播放音乐和& c),那么合理的做法是结论是OpenBLAS正在尝试启动多个线程.

Since you're running on a cluster, it's unlikely that your user is running many threads (unlike, say, if you were on your personal machine and browsing the web, playing music, &c), so it's reasonable to conclude that OpenBLAS is trying to start multiple threads.

OpenBLAS如何使用线程

OpenBLAS可以使用多个线程来加速线性代数.您可能需要多个线程来快速解决一个更大的问题.您可能需要较少的线程来同时解决许多较小的问题.

OpenBLAS can use multiple threads to accelerate linear algebra. You may want many threads for solving a single, larger problem quickly. You may want fewer threads for solving many smaller problems simultaneously.

OpenBLAS具有几种方式以限制其使用的线程数.这些是通过以下方式控制的:

OpenBLAS has several ways to limit the number of threads it uses. These are controlled via:

export OPENBLAS_NUM_THREADS=4
export GOTO_NUM_THREADS=4
export OMP_NUM_THREADS=4

优先级为OPENBLAS_NUM_THREADS> GOTO_NUM_THREADS> OMP_NUM_THREADS. (我认为这意味着OPENBLAS_NUM_THREADS会覆盖OMP_NUM_THREADS;但是,当用USE_OPENMP=1编译时,OpenBLAS会忽略OPENBLAS_NUM_THREADSGOTO_NUM_THREADS.)

The priorities are OPENBLAS_NUM_THREADS > GOTO_NUM_THREADS > OMP_NUM_THREADS. (I think this means that OPENBLAS_NUM_THREADS overrides OMP_NUM_THREADS; however, OpenBLAS ignores OPENBLAS_NUM_THREADS and GOTO_NUM_THREADS when compiled with USE_OPENMP=1.)

如果未设置上述变量,则OpenBLAS将使用与您的计算机上的内核数(您的计算机上的32个内核)相等的线程数运行.

If none of the foregoing variables are set, OpenBLAS will run using a number of threads equal to the number of cores on your machine (32 on your machine)

您的情况

您的群集具有32核CPU.您正在尝试运行36个Python实例.每个实例的Python需要1个线程,而OpenBLAS需要32个线程.您还需要1个线程用于SSH连接和1个线程用于Shell.这意味着您需要36 *(32 + 1)+ 2 = 1190个线程.

Your cluster has 32-core CPUs. You're trying to run 36 instances of Python. Each instance requires 1 thread for Python + 32 threads for OpenBLAS. You'll also need 1 thread for your SSH connection and 1 thread for your shell. That means that you need 36*(32+1)+2=1190 threads.

解决该问题的核选项是使用:

The nuclear option for fixing the problem is to use:

export OPENBLAS_NUM_THREADS=1

这应该使您减少到36 *(1 + 1)+ 2 = 74个线程.

which should bring you down to 36*(1+1)+2=74 threads.

由于您有剩余容量,可以将OPENBLAS_NUM_THREADS调整为更高的值,但是由您单独的Python进程拥有的OpenBLAS实例将相互干扰.因此,在获得一个解决方案的速度与获得多个解决方案的速度之间需要权衡.理想情况下,您可以通过在每个节点上运行更少的Python并使用更多的节点来解决这种折衷.

Since you have spare capacity, you could adjust OPENBLAS_NUM_THREADS to a higher value, but then the OpenBLAS instances owned by your separate Python processes will interfere with each other. So there's a trade-off between how fast you get one solution versus how fast you can get many solutions. Ideally, you can solve this trade-off by running fewer Pythons per node and using more nodes.

这篇关于同时运行的多个Python实例限制为35个的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆