Python多处理:处理2000个进程 [英] Python multiprocessing: dealing with 2000 processes

查看:106
本文介绍了Python多处理:处理2000个进程的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

以下是我的多处理代码. regressTuple大约有2000个项目.因此,以下代码创建了约2000个并行进程.我的Dell xps 15笔记本电脑在运行时崩溃.

Following is my multi processing code. regressTuple has around 2000 items. So, the following code creates around 2000 parallel processes. My Dell xps 15 laptop crashes when this is run.

  1. python多处理库不能根据硬件可用性处理队列并运行程序而不会在最短时间内崩溃吗?我这样做不正确吗?
  2. Python中是否有API调用来获取可能的硬件进程计数?
  3. 我如何重构代码以使用输入变量来获取并行线程计数(硬编码)并循环执行多次线程直到完成-这样,经过几次实验,我将能够获得最佳线程数.
  4. 在最短的时间内运行该代码而不会崩溃的最佳方法是什么. (我无法在实现中使用多线程)
  1. Can't python multi processing library handle the queue according to hardware availability and run the program without crashing in minimal time? Am I not doing this correctly?
  2. Is there a API call in python to get the possible hardware process count?
  3. How can I refactor the code to use an input variable to get the parallel thread count(hard coded) and loop through threading several times till completion - In this way, after few experiments, I will be able to get the optimal thread count.
  4. What is the best way to run this code in minimal time without crashing. (I cannot use multi-threading in my implementation)

这里是我的代码:

regressTuple = [(x,) for x in regressList]
processes = []

for i in range(len(regressList)):                  
    processes.append(Process(target=runRegressWriteStatus,args=regressTuple[i]))

for process in processes: 
    process.start() 

for process in processes:
    process.join()

推荐答案

我们需要牢记许多事情

  1. 纺丝进程的数量不受系统上核心数量的限制,而是系统上用户ID的ulimit,它控制由用户ID启动的进程总数.

  1. Spinning the number of processes are not limited by number of cores on your system but the ulimit for your user id on your system that controls total number of processes that be launched by your user id.

核的数量决定了实际上一次可以并行运行的启动进程有多少.

The number of cores determine how many of those launched processes can actually be running in parallel at one time.

系统崩溃可能是由于这些进程正在运行的目标功能正在做大量的事情和占用大量资源,当多个进程同时运行或nprocs限制时,该系统无法处理系统已经用尽,现在内核无法旋转新的系统进程.

Crashing of your system can be due to the fact your target function that these processes are running is doing something heavy and resource intensive, which system is not able to handle when multiple processes run simultaneously or nprocs limit on the system has exhausted and now kernel is not able to spin new system processes.

话虽如此,即使您拥有16核Intel Skylake计算机,生成多达2000个进程也不是一个好主意,因为在系统上创建新进程并不是一件轻量的任务,因为存在诸如生成pid,分配内存,地址空间生成,调度进程,上下文切换以及管理其在后台发生的整个生命周期之类的事情.因此,内核生成新进程是一项繁重的操作,

That being said it is not a good idea to spawn as many as 2000 processes, no matter even if you have a 16 core Intel Skylake machine, because creating a new process on the system is not a light weight task because there are number of things like generating the pid, allocating memory, address space generation, scheduling the process, context switching and managing the entire life cycle of it that happen in the background. So it is a heavy operation for the kernel to generate a new process,

不幸的是,我想您要尝试执行的任务是CPU绑定任务,因此受到计算机硬件的限制.旋转数量比系统核心数量更多的进程完全无济于事,但是创建一个进程池可能会有所帮助.因此,基本上,您想创建一个池,该池具有与系统上具有核心数量一样多的进程,然后将输入传递给池.像这样

Unfortunately I guess what you are trying to do is a CPU bound task and hence limited by the hardware you have on the machine. Spinning more number of processes than the number of cores on your system is not going to help at all, but creating a process pool might. So basically you want to create a pool with as many number of processes as you have cores on the system and then pass the input to the pool. Something like this

def target_func(data):
    # process the input data

with multiprocessing.pool(processes=multiprocessing.cpu_count()) as po:
    res = po.map(f, regressionTuple)

这篇关于Python多处理:处理2000个进程的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆