从 python 中的 stdin 无缓冲读取 [英] unbuffered read from stdin in python

查看:42
本文介绍了从 python 中的 stdin 无缓冲读取的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在编写一个 python 脚本,它可以通过管道从另一个命令读取输入

I'm writing a python script that can read input through a pipe from another command like so

batch_job | myparser

我的脚本 myparser 处理 batch_job 的输出并写入它自己的标准输出.我的问题是我想立即看到输出(batch_job 的输出是逐行处理的),但似乎有这种臭名昭著的 stdin 缓冲(据称是 4KB,我尚未验证)会延迟一切.

My script myparser processes the output of batch_job and write to its own stdout. My problem is that I want to see the output immediately (the output of batch_job is processed line-by-line) but there appears to be this notorious stdin buffering (allegedly 4KB, I haven't verified) which delays everything.

这个问题已经在这里讨论 此处此处.

The problem has been discussed already here here and here.

我尝试了以下方法:

  • 使用 os.fdopen(sys.stdin.fileno(), 'r', 0) 打开标准输入
  • 在我的 hashbang 中使用 -u:#!/usr/bin/python -u
  • 在调用脚本之前设置export PYTHONUNBUFFERED=1
  • 在读取的每一行之后刷新我的输出(以防问题出在输出缓冲而不是输入缓冲)
  • open stdin using os.fdopen(sys.stdin.fileno(), 'r', 0)
  • using -u in my hashbang: #!/usr/bin/python -u
  • setting export PYTHONUNBUFFERED=1 right before calling the script
  • flushing my output after each line that was read (just in case the problem was coming from output buffering rather than input buffering)

我的 python 版本是 2.4.3 - 我不可能升级或安装任何额外的程序或包.我怎样才能摆脱这些延迟?

My python version is 2.4.3 - I have no possibility of upgrading or installing any additional programs or packages. How can I get rid of these delays?

推荐答案

我在处理遗留代码时遇到了同样的问题.看来是Python 2的file对象的__next__方法的实现有问题;它使用 Python 级别的缓冲区(其中 -u/PYTHONUNBUFFERED=1 不影响,因为那些只对 stdio FILE* 本身,但 file.__next__ 的缓冲无关;同样,stdbuf/unbuffer 不能改变任何缓冲,因为 Python 替换了 C 运行时创建的默认缓冲区;最后一件事 file.__init__ 对新打开的文件执行调用 PyFile_SetBufSize ,它使用 setvbuf/setbuf [APIs] 替换默认的 stdio 缓冲区).

I've encountered the same issue with legacy code. It appears to be a problem with the implementation of Python 2's file object's __next__ method; it uses a Python level buffer (which -u/PYTHONUNBUFFERED=1 doesn't affect, because those only unbuffer the stdio FILE*s themselves, but file.__next__'s buffering isn't related; similarly, stdbuf/unbuffer can't change any of the buffering at all, because Python replaces the default buffer made by the C runtime; the last thing file.__init__ does for a newly opened file is call PyFile_SetBufSize which uses setvbuf/setbuf [the APIs] to replace the default stdio buffer).

当你有一个形式的循环时就会看到问题:

The problem is seen when you have a loop of the form:

for line in sys.stdin:

__next__ 的第一次调用(由 for 循环隐式调用以获取每个 line)最终阻塞以填充之前的块生成单行.

where the first call to __next__ (called implicitly by the for loop to get each line) ends up blocking to fill the block before producing a single line.

有三种可能的修复方法:

There are three possible fixes:

  1. (仅在 Python 2.6+ 上)使用 io 模块重新包装 sys.stdio(从 Python 3 作为内置模块反向移植)以绕过 file 完全赞成(坦率地说是优越的)Python 3 设计(它一次使用一个系统调用来填充缓冲区而不阻塞以发生完整的请求读取;如果它要求 4096 字节并得到3,它会查看一条线是否可用,如果是,则生产它)所以:

  1. (Only on Python 2.6+) Rewrap sys.stdio with the io module (backported from Python 3 as a built-in) to bypass file entirely in favor of the (frankly superior) Python 3 design (which uses a single system call at a time to populate the buffer without blocking for the full requested read to occur; if it asks for 4096 bytes and gets 3, it'll see if a line is available and produce it if so) so:

import io
import sys

# Add buffering=0 argument if you won't always consume stdin completely, so you 
# can't lose data in the wrapper's buffer. It'll be slower with buffering=0 though.
with io.open(sys.stdin.fileno(), 'rb', closefd=False) as stdin:
    for line in stdin:
        # Do stuff with the line

这通常比选项 2 快,但它更冗长,并且需要 Python 2.6+.它还允许重新包装对 Unicode 友好,方法是将模式更改为 'r' 并可选择传递输入的已知 encoding(如果它不是区域设置默认值)无缝获取 unicode 行而不是(仅限 ASCII)str.

This will typically be faster than option 2, but it's more verbose, and requires Python 2.6+. It also allows for the rewrap to be Unicode friendly, by changing the mode to 'r' and optionally passing the known encoding of the input (if it's not the locale default) to seamlessly get unicode lines instead of (ASCII only) str.

(任何版本的 Python)通过使用 file.readline 来解决 file.__next__ 的问题;尽管几乎相同的预期行为,readline 不做自己的(过度)缓冲,它委托给 C stdiofgets(默认构建settings) 或手动循环调用 getc/getc_unlocked 进入缓冲区,该缓冲区在到达行尾时准确停止.通过将它与两个参数 iter 结合使用,您可以获得几乎相同的代码而不会过多冗长(它可能比之前的解决方案慢,这取决于是否在下面使用了 fgets引擎盖,以及 C 运行时如何实现它):

(Any version of Python) Work around problems with file.__next__ by using file.readline instead; despite nearly identical intended behavior, readline doesn't do its own (over)buffering, it delegates to C stdio's fgets (default build settings) or a manual loop calling getc/getc_unlocked into a buffer that stops exactly when it hits end of line. By combining it with two-arg iter you can get nearly identical code without excess verbosity (it'll probably be slower than the prior solution, depending on whether fgets is used under the hood, and how the C runtime implements it):

# '' is the sentinel that ends the loop; readline returns '' at EOF
for line in iter(sys.stdin.readline, ''):
    # Do stuff with line

  • 转移到没有这个问题的 Python 3.:-)

  • Move to Python 3, which doesn't have this problem. :-)

    这篇关于从 python 中的 stdin 无缓冲读取的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

  • 查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆