限制通过Python脚本一次运行的进程数 [英] Limiting the number of processes running at a time from a Python script
问题描述
我正在运行一个备份脚本,该脚本启动子进程以通过rsync执行备份.但是我没有办法限制一次启动的rsync的数量.
I'm running a backup script that launches child processes to perform backups by rsync. However I have no way to limit the number of rsyncs it launches at a time.
这是我目前正在处理的代码:
Here's the code I'm working on at the moment:
print "active_children: ", multiprocessing.active_children()
print "active_children len: ", len(multiprocessing.active_children())
while len(multiprocessing.active_children()) > 49:
sleep(2)
p = multiprocessing.Process(target=do_backup, args=(shash["NAME"],ip,shash["buTYPE"], ))
jobs.append(p)
p.start()
当我运行数百个rsync时,这显示最多一个孩子.这是实际启动rsync的代码(从do_backup函数内部),其中command
是包含rsync行的变量:
This is showing a maximum of one child when I have hundreds of rsyncs running. Here's the code that actually launches the rsync (from inside the do_backup function), with command
being a variable containing the rsync line:
print command
subprocess.Popen(command, stdout=subprocess.PIPE, shell=True)
return 1
如果我在do_backup函数中添加sleep(x),它将在睡眠时显示为活动的孩子.进程表还显示rsync进程的PPID为1.据此,我假设rsync会分离并且不再是python的子进程,这会使我的子进程死亡,因此我无法再对其进行计数了. .有谁知道如何让python孩子继续生存并被计数直到rsync完成?
If I add a sleep(x) to the do_backup function it will show up as an active child while it's sleeping. Also the process table is showing the rsync processes as having a PPID of 1. I'm assuming from this that the rsync splits off and is no longer a child of python which allows my child process to die so I can't count it anymore. Does anyone know how to keep the python child alive and being counted until the rsync is complete?
推荐答案
让我们先清除一些误解
据此,我假设rsync分裂了,不再是 python的子代,它允许我的子代进程死亡,所以我无法计数 它了.
I'm assuming from this that the rsync splits off and is no longer a child of python which allows my child process to die so I can't count it anymore.
rsync
执行拆分".在UNIX系统上,这称为叉.
rsync
does "split off". On UNIX systems, this is called a fork.
在派生一个进程时,会创建一个子进程-因此rsync
是python的子级.这个孩子独立于父母执行-并且并发执行(同时").
When a process forks, a child process is created - so rsync
is a child of python. This child executes independently of the parent - and concurrently ("at the same time").
一个进程可以管理自己的子进程.为此,有特定的系统调用,但是在谈论python时,话题有些偏离主题了,它具有自己的高级界面
A process can manage its own children. There are specific syscalls for that, but it's a bit off-topic when talking about python, which has its own high-level interfaces
如果您查看 subprocess.Popen
的文档,您会注意到它根本不是函数调用:它是一个类.通过调用它,您将创建该类的实例- Popen对象.
这样的对象有多种方法.特别是 wait
将允许您阻止您的父进程(python),直到子进程终止.
If you check subprocess.Popen
's documentation, you'll notice that it's not a function call at all: it's a class. By calling it, you'll create a instance of that class - a Popen object.
Such objects have multiple methods. In particular, wait
will allow you to block your parent process (python) until the child process terminates.
考虑到这一点,让我们看一下您的代码并将其简化一下:
With this in mind, let's take a look at your code and simplify it a bit:
p = multiprocessing.Process(target=do_backup, ...)
在这里,您实际上是在分叉并创建一个子进程. 此进程是另一个python解释器(与所有multiprocessing
进程一样),并将执行do_backup
函数.
Here, you're actually forking and creating a child process. This process is another python interpreter (as with all multiprocessing
processes), and will execute the do_backup
function.
def do_backup()
subprocess.Popen("rsync ...", ...)
在这里,您要再次.您将创建另一个进程(rsync
),并让它在后台"运行,因为您没有wait
进行此操作.
Here, you are forking again. You'll create yet another process (rsync
), and let it run "in the background", because you're not wait
ing for it.
所有这些都已清除,希望您能看到使用现有代码的一种方法.如果要降低复杂度,建议您检查并改编JoErNanO的答案,该答案可重复使用multiprocessing.Pool
来自动跟踪流程.
With all this cleared up, I hope you can see a way forward with your existing code. If you want to reduce it's complexity, I recommend you check and adapt JoErNanO's answer, that reuses multiprocessing.Pool
to automate keeping track of the processes.
无论您决定采用哪种方式,都应避免使用Popen
进行分叉来创建rsync
流程-因为这会不必要地创建另一个流程.相反,请检查os.execv
,它用另一个
Whichever way you decide to pursuit, you should avoid forking with Popen
to create the rsync
process - as that creates yet another process unnecessarily. Instead, check os.execv
, which replaces the current process with another
这篇关于限制通过Python脚本一次运行的进程数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!