使用python读取子进程的输出 [英] Reading output from child process using python

查看:135
本文介绍了使用python读取子进程的输出的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用subprocess模块从python启动进程.我希望能够在写入/缓冲后立即访问输出(stdout,stderr).

I am using the subprocess module to start a process from python. I want to be able to access the output (stdout, stderr) as soon as it is written/buffered.

  • 该解决方案必须支持Windows7.我也需要针对Unix系统的解决方案,但我怀疑Windows案例更难解决.
  • 该解决方案应支持Python 2.6.我目前仅限于Python 2.6,但仍赞赏使用更高版本的Python的解决方案.
  • 该解决方案不应使用第三方库.理想情况下,我希望使用标准库的解决方案,但我愿意接受建议.
  • 该解决方案必须适用于几乎所有流程.假设无法控制正在执行的进程.

例如,假设我想通过subprocess运行一个名为counter.py的python文件. counter.py的内容如下:

For example, imagine I want to run a python file called counter.py via a subprocess. The contents of counter.py is as follows:

import sys

for index in range(10):

    # Write data to standard out.
    sys.stdout.write(str(index))

    # Push buffered data to disk.
    sys.stdout.flush()

父流程

负责执行counter.py示例的父进程如下:

The Parent Process

The parent process responsible for executing the counter.py example is as follows:

import subprocess

command = ['python', 'counter.py']

process = subprocess.Popen(
    cmd,
    bufsize=1,
    stdout=subprocess.PIPE,
    stderr=subprocess.PIPE,
    ) 

问题

使用counter.py示例,我可以在过程完成之前访问数据.这很棒!这正是我想要的.但是,删除sys.stdout.flush()调用会阻止在需要时访问数据.这不好!这正是我所不想要的.我的理解是flush()调用强制将数据写入磁盘,并且在将数据写入磁盘之前,它仅存在于缓冲区中.记住,我希望能够运行几乎所有进程.我不希望该过程执行这种刷新,但我仍然希望数据实时(或接近)可用.有没有办法做到这一点?

The Issue

Using the counter.py example I can access the data before the process has completed. This is great! This is exactly what I want. However, removing the sys.stdout.flush() call prevents the data from being accessed at the time I want it. This is bad! This is exactly what I don't want. My understanding is that the flush() call forces the data to be written to disk and before the data is written to disk it exists only in a buffer. Remember I want to be able to run just about any process. I do not expect the process to perform this kind of flushing but I still expect the data to be available in real time (or close to it). Is there a way to achieve this?

有关父进程的快速注释.您可能会注意到我正在使用bufsize=0进行行缓冲.我希望这会导致刷新每一行的磁盘,但是似乎无法正常工作.这个论点如何起作用?

A quick note about the parent process. You may notice I am using bufsize=0 for line buffering. I was hoping this would cause a flush to disk for every line but it doesn't seem to work that way. How does this argument work?

您还将注意到我正在使用subprocess.PIPE.这是因为它似乎是在父进程和子进程之间生成IO对象的唯一值.通过查看subprocess模块中的Popen._get_handles方法得出了这个结论(我在这里指的是Windows定义).有两个重要的变量c2preadc2pwrite,这些变量是根据传递给Popen构造函数的stdout值设置的.例如,如果未设置stdout,则不会设置c2pread变量.使用文件描述符和类似文件的对象时也是如此.我真的不知道这是否有意义,但是我的直觉告诉我,我想同时读取和写入IO对象以达到我想要达到的目的-这就是为什么我选择subprocess.PIPE的原因.如果有人可以更详细地解释这一点,我将不胜感激.同样,如果有迫不得已的理由使用subprocess.PIPE以外的其他东西,我会全力以赴.

You will also notice I am using subprocess.PIPE. This is because it appears to be the only value which produces IO objects between the parent and child processes. I have come to this conclusion by looking at the Popen._get_handles method in the subprocess module (I'm referring to the Windows definition here). There are two important variables, c2pread and c2pwrite which are set based on the stdout value passed to the Popen constructor. For instance, if stdout is not set, the c2pread variable is not set. This is also the case when using file descriptors and file-like objects. I don't really know whether this is significant or not but my gut instinct tells me I would want both read and write IO objects for what I am trying to achieve - this is why I chose subprocess.PIPE. I would be very grateful if someone could explain this in more detail. Likewise, if there is a compelling reason to use something other than subprocess.PIPE I am all ears.

import time
import subprocess
import threading
import Queue


class StreamReader(threading.Thread):
    """
    Threaded object used for reading process output stream (stdout, stderr).   
    """

    def __init__(self, stream, queue, *args, **kwargs):
        super(StreamReader, self).__init__(*args, **kwargs)
        self._stream = stream
        self._queue = queue

        # Event used to terminate thread. This way we will have a chance to 
        # tie up loose ends. 
        self._stop = threading.Event()

    def stop(self):
        """
        Stop thread. Call this function to terminate the thread. 
        """
        self._stop.set()

    def stopped(self):
        """
        Check whether the thread has been terminated.
        """
        return self._stop.isSet()

    def run(self):
        while True:
            # Flush buffered data (not sure this actually works?)
            self._stream.flush()

            # Read available data.
            for line in iter(self._stream.readline, b''):
                self._queue.put(line)

            # Breather.
            time.sleep(0.25)

            # Check whether thread has been terminated.
            if self.stopped():
                break


cmd = ['python', 'counter.py']

process = subprocess.Popen(
    cmd,
    bufsize=1,
    stdout=subprocess.PIPE,
    )

stdout_queue = Queue.Queue()
stdout_reader = StreamReader(process.stdout, stdout_queue)
stdout_reader.daemon = True
stdout_reader.start()

# Read standard out of the child process whilst it is active.  
while True:

    # Attempt to read available data.  
    try:
        line = stdout_queue.get(timeout=0.1)
        print '%s' % line

    # If data was not read within time out period. Continue. 
    except Queue.Empty:
        # No data currently available.
        pass

    # Check whether child process is still active.
    if process.poll() != None:

        # Process is no longer active.
        break

# Process is no longer active. Nothing more to read. Stop reader thread.
stdout_reader.stop()

在这里,我正在执行从线程中的子进程中读取标准的逻辑.这允许在数据可用之前阻止读取的情况.我们没有等待可能长的时间,而是检查是否有可用的数据要在超时时间内读取,如果没有则继续循环.

Here I am performing the logic which reads standard out from the child process in a thread. This allows for the scenario in which the read is blocking until data is available. Instead of waiting for some potentially long period of time, we check whether there is available data, to be read within a time out period, and continue looping if there is not.

我还尝试了另一种使用非阻塞读取的方法.此方法使用ctypes模块访问Windows系统调用.请注意,我不完全了解自己在这里所做的事情-我只是试图弄清楚我在其他帖子中看到的一些示例代码.无论如何,以下代码片段都无法解决缓冲问题.我的理解是,这只是与可能更长的阅读时间作斗争的另一种方法.

I have also tried another approach using a kind of non-blocking read. This approach uses the ctypes module to access Windows system calls. Please note that I don't fully understand what I am doing here - I have simply tried to make sense of some example code I have seen in other posts. In any case, the following snippet doesn't solve the buffering issue. My understanding is that it's just another way to combat a potentially long read time.

import os
import subprocess

import ctypes
import ctypes.wintypes
import msvcrt

cmd = ['python', 'counter.py']

process = subprocess.Popen(
    cmd,
    bufsize=1,
    stdout=subprocess.PIPE,
    )


def read_output_non_blocking(stream):
    data = ''
    available_bytes = 0

    c_read = ctypes.c_ulong()
    c_available = ctypes.c_ulong()
    c_message = ctypes.c_ulong()

    fileno = stream.fileno()
    handle = msvcrt.get_osfhandle(fileno)

    # Read available data.
    buffer_ = None
    bytes_ = 0
    status = ctypes.windll.kernel32.PeekNamedPipe(
        handle,
        buffer_,
        bytes_,
        ctypes.byref(c_read),
        ctypes.byref(c_available),
        ctypes.byref(c_message),
        )

    if status:
        available_bytes = int(c_available.value)

    if available_bytes > 0:
        data = os.read(fileno, available_bytes)
        print data

    return data

while True:

    # Read standard out for child process.
    stdout = read_output_non_blocking(process.stdout)
    print stdout

    # Check whether child process is still active.
    if process.poll() != None:

        # Process is no longer active.
        break

评论非常感谢.

欢呼

推荐答案

此处存在问题的是 child 进程的缓冲.您的subprocess代码已经可以正常运行,但是,如果您有一个子进程可以缓冲其输出,那么subprocess管道对此无能为力.

At issue here is buffering by the child process. Your subprocess code already works as well as it could, but if you have a child process that buffers its output then there is nothing that subprocess pipes can do about this.

我对此不够强调:您看到的缓冲延迟是子进程的责任,而子进程如何处理缓冲与subprocess模块无关.

I cannot stress this enough: the buffering delays you see are the responsibility of the child process, and how it handles buffering has nothing to do with the subprocess module.

您已经发现了这一点;这就是为什么在子进程中添加sys.stdout.flush()可以更快地显示数据的原因;子进程在通过sys.stdout管道 1 向下发送之前,使用缓冲的I/O(用于收集写入数据的内存缓存).

You already discovered this; this is why adding sys.stdout.flush() in the child process makes the data show up sooner; the child process uses buffered I/O (a memory cache to collect written data) before sending it down the sys.stdout pipe 1.

Python自动使用行缓冲.只要写入换行符,缓冲区就会刷新.使用管道时,sys.stdout未连接到终端,而是使用固定大小的缓冲区.

Python automatically uses line-buffering when sys.stdout is connected to a terminal; the buffer flushes whenever a newline is written. When using pipes, sys.stdout is not connected to a terminal and a fixed-size buffer is used instead.

现在,可以告诉Python子进程 以不同的方式处理缓冲.您可以设置环境变量或使用命令行开关来更改其对sys.stdout(以及sys.stderrsys.stdin)使用缓冲的方式.从 Python命令行文档:

Now, the Python child process can be told to handle buffering differently; you can set an environment variable or use a command-line switch to alter how it uses buffering for sys.stdout (and sys.stderr and sys.stdin). From the Python command line documentation:

-u
强制stdin,stdout和stderr完全没有缓冲.在重要的系统上,也将stdin,stdout和stderr置于二进制模式.

-u
Force stdin, stdout and stderr to be totally unbuffered. On systems where it matters, also put stdin, stdout and stderr in binary mode.

[...]

PYTHONUNBUFFERED
如果将其设置为非空字符串,则等效于指定 -u 选项.

PYTHONUNBUFFERED
If this is set to a non-empty string it is equivalent to specifying the -u option.

如果您正在处理不是 Python进程的子进程,并且遇到这些进程的缓冲问题,则需要查看这些进程的文档,以查看是否可以将它们切换到使用无缓冲的I/O,或切换到更理想的缓冲策略.

If you are dealing with child processes that are not Python processes and you experience buffering issues with those, you'll need to look at the documentation of those processes to see if they can be switched to use unbuffered I/O, or be switched to more desirable buffering strategies.

您可以尝试的一件事是使用 script -c命令提供一个子进程的伪终端.但是,这是POSIX工具,可能在Windows上不可用.

One thing you could try is to use the script -c command to provide a pseudo-terminal to a child process. This is a POSIX tool, however, and is probably not available on Windows.

1. 应该注意的是,在刷新管道时,不会将任何数据写入磁盘".所有数据都完全保留在内存中. I/O缓冲区只是内存缓存,它通过处理较大的块中的数据来获得I/O的最佳性能.仅当您具有基于磁盘的文件对象时,fileobj.flush()才会使它将所有缓冲区推入操作系统,这通常意味着数据确实已写入磁盘.

1. It should be noted that when flushing a pipe, no data is 'written to disk'; all data remains entirely in memory here. I/O buffers are just memory caches to get the best performance out of I/O by handling data in larger chunks. Only if you have a disk-based file object would fileobj.flush() cause it to push any buffers to the OS, which usually means that data is indeed written to disk.

这篇关于使用python读取子进程的输出的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆