使用 subprocess.Popen 处理大输出进程 [英] Using subprocess.Popen for Process with Large Output
问题描述
我有一些 Python 代码可以执行外部应用程序,当应用程序有少量输出时该应用程序运行良好,但在输出量很大时挂起.我的代码看起来像:
I have some Python code that executes an external app which works fine when the app has a small amount of output, but hangs when there is a lot. My code looks like:
p = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
errcode = p.wait()
retval = p.stdout.read()
errmess = p.stderr.read()
if errcode:
log.error('cmd failed <%s>: %s' % (errcode,errmess))
文档中的评论似乎表明了潜在问题.在等待之下,有:
There are comments in the docs that seem to indicate the potential issue. Under wait, there is:
警告:如果子进程向stdout
或stderr
管道生成足够的输出,从而阻止等待操作系统管道缓冲区接受更多数据,这将导致死锁.使用 communicate()
来避免这种情况.
Warning: This will deadlock if the child process generates enough output to a
stdout
orstderr
pipe such that it blocks waiting for the OS pipe buffer to accept more data. Usecommunicate()
to avoid that.
虽然在沟通下,我看到:
though under communicate, I see:
注意读取的数据是缓存在内存中的,所以如果数据量很大或者没有限制就不要使用这种方法.
Note The data read is buffered in memory, so do not use this method if the data size is large or unlimited.
所以我不清楚如果我有大量数据,我应该使用其中的任何一个.他们没有说明在这种情况下我应该使用什么方法.
So it is unclear to me that I should use either of these if I have a large amount of data. They don't indicate what method I should use in that case.
我确实需要 exec 的返回值,并解析并使用 stdout
和 stderr
.
I do need the return value from the exec and do parse and use both the stdout
and stderr
.
那么在 Python 中执行一个将有大量输出的外部应用程序的等效方法是什么?
So what is an equivalent method in Python to exec an external app that is going to have large output?
推荐答案
您正在阻止读取两个文件;第一个需要在第二个开始之前完成.如果应用程序向 stderr
写入了很多内容,而 stdout
没有写入任何内容,那么您的进程将等待 stdout
上未出现的数据,而您正在运行的程序坐在那里等待它写入 stderr
的内容被读取(它永远不会 - 因为您正在等待 stdout
).
You're doing blocking reads to two files; the first needs to complete before the second starts. If the application writes a lot to stderr
, and nothing to stdout
, then your process will sit waiting for data on stdout
that isn't coming, while the program you're running sits there waiting for the stuff it wrote to stderr
to be read (which it never will be--since you're waiting for stdout
).
有几种方法可以解决这个问题.
There are a few ways you can fix this.
最简单的就是不拦截stderr
;离开 stderr=None
.错误会直接输出到stderr
.您无法拦截它们并将它们显示为您自己的消息的一部分.对于命令行工具,这通常是可以的.对于其他应用,这可能是个问题.
The simplest is to not intercept stderr
; leave stderr=None
. Errors will be output to stderr
directly. You can't intercept them and display them as part of your own message. For commandline tools, this is often OK. For other apps, it can be a problem.
另一种简单的方法是将 stderr
重定向到 stdout
,这样您就只有一个传入文件:set stderr=STDOUT
.这意味着您无法区分常规输出和错误输出.这可能会也可能不可接受,具体取决于应用程序如何写入输出.
Another simple approach is to redirect stderr
to stdout
, so you only have one incoming file: set stderr=STDOUT
. This means you can't distinguish regular output from error output. This may or may not be acceptable, depending on how the application writes output.
处理这个完整而复杂的方法是 select
(http://docs.python.org/library/select.html).这使您可以以非阻塞方式读取:只要数据出现在 stdout
或 stderr
上,您就可以获得数据.如果真的有必要,我只会推荐这个.这在 Windows 中可能不起作用.
The complete and complicated way of handling this is select
(http://docs.python.org/library/select.html). This lets you read in a non-blocking way: you get data whenever data appears on either stdout
or stderr
. I'd only recommend this if it's really necessary. This probably doesn't work in Windows.
这篇关于使用 subprocess.Popen 处理大输出进程的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!