从子进程流式读取 [英] Streaming read from subprocess

查看:30
本文介绍了从子进程流式读取的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要在子进程产生时读取它的输出——也许不是每次write,而是在进程完成之前.我已经尝试过 Python3 文档和 SO 问题here此处,但在孩子终止之前我仍然一无所获.

该应用程序用于监控深度学习模型的训练.我需要获取测试输出(每次迭代大约 250 个字节,间隔大约 1 分钟)并观察统计失败.

  • 我无法更改训练引擎;例如,我不能在子进程代码中插入 stdout.flush().
  • 我可以合理地等待十多行输出的积累;我希望缓冲区填充可以解决我的问题.

代码:注释掉变体.

家长

cmd = ["/usr/bin/python3", "zzz.py"]# test_proc = subprocess.Popen(test_proc = subprocess.run(指令,标准输出=子进程.PIPE,stderr=subprocess.STDOUT)out_data = ""打印(时间.时间(),开始")而不是在 str(out_data) 中退出":out_data = test_proc.stdout# out_data, err_data = test_proc.communicate()打印(时间.时间(),主要收到",out_data)

子(zzz.py)

从时间导入睡眠导入系统对于 _ 范围(5):打印(_,睡觉",."* 1000)# sys.stdout.flush()睡觉(1)打印(退出这个练习")

尽管发送了 1000 多个字节的行,缓冲区(在其他地方测试为 2kb;在这里,我已经高达 50kb)填充不会导致父级看到"新文本.

我缺少什么才能让它发挥作用?

<小时>

关于链接、评论和 iBug 发布的答案的更新:

  • Popen 而不是 run 修复了阻塞问题.不知何故,我在文档和我对两者的实验中都错过了这一点.
  • universal_newline=True 巧妙地将字节返回到字符串:在接收端更容易处理,尽管带有交错的空行(易于检测和丢弃).
  • bufsize 设置为很小的值(例如 1)没有任何影响;父级仍然必须等待子级填充 stdout 缓冲区,在我的情况下为 8k.
  • export PYTHONUNBUFFERED=1 在执行之前did 修复了缓冲问题.感谢 wim 提供链接.

除非有人想出一个规范的、漂亮的解决方案使这些过时,否则我明天会接受 iBug 的回答.

解决方案

subprocess.run 总是产生子进程,并且阻塞线程直到它退出.>

您唯一的选择是使用 p = subprocess.Popen(...) 并使用 s = p.stdout.readline() 读取行代码>p.stdout.__iter__()(见下文).

此代码适用于我,如果子进程在打印一行后刷新标准输出(请参阅下面的扩展说明).

cmd = ["/usr/bin/python3", "zzz.py"]test_proc = subprocess.Popen(指令,标准输出=子进程.PIPE,stderr=subprocess.STDOUT)out_data = ""打印(时间.时间(),开始")而不是在 str(out_data) 中退出":out_data = test_proc.stdout.readline()打印(时间.时间(),主要收到",out_data)test_proc.communicate() # 关闭它

查看我的终端日志(从 zzz.py 中删除的点):

ibug@ubuntu:~/t $ python3 p.py1546450821.9174328 开始1546450821.9793346 MAIN 收到 b'0 睡眠 \n'1546450822.987753 MAIN 收到 b'1 睡眠 \n'1546450823.993136 MAIN 收到 b'2 睡眠 \n'1546450824.997726 MAIN 收到 b'3 睡眠 \n'1546450825.9975247 MAIN 收到 b'4 睡眠 \n'1546450827.0094354 MAIN 收到 b'QUIT this practice\n'

您也可以使用 for 循环来实现:

 for out_data in test_proc.stdout:如果 str(out_data) 中的退出":休息打印(时间.时间(),主要收到",out_data)

<小时>

如果您无法修改子进程,unbuffer(来自包 expect - 使用 APT 或 YUM 安装)可能会有所帮助.这是我的工作父代码没有更改子代码.

test_proc = subprocess.Popen([取消缓冲"] + cmd,标准输出=子进程.PIPE,stderr=subprocess.STDOUT)

I need to read output from a child process as it's produced -- perhaps not on every write, but well before the process completes. I've tried solutions from the Python3 docs and SO questions here and here, but I still get nothing until the child terminates.

The application is for monitoring training of a deep learning model. I need to grab the test output (about 250 bytes for each iteration, at roughly 1-minute intervals) and watch for statistical failures.

  • I cannot change the training engine; for instance, I cannot insert stdout.flush() in the child process code.
  • I can reasonably wait for a dozen lines of output to accumulate; I was hopeful of a buffer-fill solving my problem.

Code: variations are commented out.

Parent

cmd = ["/usr/bin/python3", "zzz.py"]
# test_proc = subprocess.Popen(
test_proc = subprocess.run(
    cmd,
    stdout=subprocess.PIPE,
    stderr=subprocess.STDOUT
    )

out_data = ""
print(time.time(), "START")
while not "QUIT" in str(out_data):
    out_data = test_proc.stdout
    # out_data, err_data = test_proc.communicate()
    print(time.time(), "MAIN received", out_data)

Child (zzz.py)

from time import sleep
import sys

for _ in range(5):
    print(_, "sleeping", "."*1000)
    # sys.stdout.flush()
    sleep(1)

print("QUIT this exercise")

Despite sending lines of 1000+ bytes, the buffer (tested elsewhere as 2kb; here, I've gone as high as 50kb) filling doesn't cause the parent to "see" the new text.

What am I missing to get this to work?


Update with regard to links, comments, and iBug's posted answer:

  • Popen instead of run fixed the blocking issue. Somehow I missed this in the documentation and my experiments with both.
  • universal_newline=True neatly changed the bytes return to string: easier to handle on the receiving end, although with interleaved empty lines (easy to detect and discard).
  • Setting bufsize to something tiny (e.g. 1) didn't affect anything; the parent still has to wait for the child to fill the stdout buffer, 8k in my case.
  • export PYTHONUNBUFFERED=1 before execution did fix the buffering problem. Thanks to wim for the link.

Unless someone comes up with a canonical, nifty solution that makes these obsolete, I'll accept iBug's answer tomorrow.

解决方案

subprocess.run always spawns the child process, and blocks the thread until it exits.

The only option for you is to use p = subprocess.Popen(...) and read lines with s = p.stdout.readline() or p.stdout.__iter__() (see below).

This code works for me, if the child process flushes stdout after printing a line (see below for extended note).

cmd = ["/usr/bin/python3", "zzz.py"]
test_proc = subprocess.Popen(
    cmd,
    stdout=subprocess.PIPE,
    stderr=subprocess.STDOUT
)

out_data = ""
print(time.time(), "START")
while not "QUIT" in str(out_data):
    out_data = test_proc.stdout.readline()
    print(time.time(), "MAIN received", out_data)
test_proc.communicate()  # shut it down

See my terminal log (dots removed from zzz.py):

ibug@ubuntu:~/t $ python3 p.py
1546450821.9174328 START
1546450821.9793346 MAIN received b'0 sleeping \n'
1546450822.987753 MAIN received b'1 sleeping \n'
1546450823.993136 MAIN received b'2 sleeping \n'
1546450824.997726 MAIN received b'3 sleeping \n'
1546450825.9975247 MAIN received b'4 sleeping \n'
1546450827.0094354 MAIN received b'QUIT this exercise\n'

You can also do it with a for loop:

for out_data in test_proc.stdout:
    if "QUIT" in str(out_data):
        break
    print(time.time(), "MAIN received", out_data)


If you cannot modify the child process, unbuffer (from package expect - install with APT or YUM) may help. This is my working parent code without changing the child code.

test_proc = subprocess.Popen(
    ["unbuffer"] + cmd,
    stdout=subprocess.PIPE,
    stderr=subprocess.STDOUT
)

这篇关于从子进程流式读取的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆