Python 子进程:提供标准输入,读取标准输出,然后提供更多标准输入 [英] Python subprocess: Giving stdin, reading stdout, then giving more stdin

查看:65
本文介绍了Python 子进程:提供标准输入,读取标准输出,然后提供更多标准输入的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用一款名为 Chimera 的科学软件.对于这个问题下游的一些代码,它要求我使用 Python 2.7.

我想调用一个进程,给这个进程一些输入,读取它的输出,根据它给它更多的输入,等等.

我使用 Popen 打开进程,使用 process.stdin.write 传递标准输入,但是我在尝试获取输出时遇到了困难进程仍在运行.process.communicate() 停止进程,process.stdout.readline() 似乎让我陷入无限循环.

<小时>

以下是我想做的一个简化示例:

假设我有一个名为 exampleInput.sh 的 bash 脚本.

#!/bin/bash#exampleInput.sh# 从输入中读取一个数字read -p '输入一个数字:' num# 将数字乘以 5ans1=$( expr $num \* 5 )# 给用户乘数回声 $ans1# 询问用户是否要继续read -p '基于之前的输出,你想继续吗?'继续如果 [ $doContinue == "是" ]然后echo 好的,继续……"# [...] 这里有更多代码 [...]别的退出 0菲

通过命令行与之交互,我会运行脚本,输入5",然后,如果它返回25",我会输入yes",如果不是,我会输入no".

我想运行一个 python 脚本,在其中传递 exampleInput.sh "5",如果它给我 "25",那么我传递 "yes"

到目前为止,这是我所能得到的:

#!/home/user/miniconda3/bin/python2#talk_with_example_input.py导入子流程process = subprocess.Popen(["./exampleInput.sh"],标准输入 = subprocess.PIPE,标准输出 = 子进程.PIPE)process.stdin.write("5")答案 = process.communicate()[0]如果答案==25":process.stdin.write("是")## 我想在这里打印 STDOUT,但进程已经终止

但这当然失败了,因为在process.communicate()"之后,我的进程不再运行了.

<小时>

(以防万一/仅供参考):实际问题

Chimera 通常是一个基于 gui 的应用程序,用于检查蛋白质结构.如果你运行 chimera --nogui,它会打开一个提示并接受输入.

在运行下一个命令之前,我经常需要知道嵌合体输出的内容.例如,我会经常尝试生成一个蛋白质表面,如果 Chimera 不能生成一个表面,它就不会破裂——它只是通过 STDOUT 这么说.因此,在我的 python 脚本中,当我循环分析许多蛋白质时,我需要检查 STDOUT 以了解是否继续对该蛋白质进行分析.

在其他用例中,我会先通过 Chimera 运行很多命令来清理蛋白质,然后我会想运行很多单独的命令来获取不同的数据,并使用这些数据来决定是否运行其他命令.我可以获取数据,关闭子进程,然后运行另一个进程,但这需要每次都重新运行所有这些清理命令.

无论如何,这些是我希望能够将 STDIN 推送到子进程、读取 STDOUT 并且仍然能够推送更多 STDIN 的一些现实原因.

感谢您的时间!

解决方案

你不需要在你的例子中使用 process.communicate.

使用process.stdin.writeprocess.stdout.read 简单地读写.还要确保发送换行符,否则 read 不会返回.当您从 stdin 读取时,您还必须处理来自 echo 的换行符.

注意:process.stdout.read 会阻塞直到 EOF.

# talk_with_example_input.py导入子流程process = subprocess.Popen(["./exampleInput.sh"],标准输入 = subprocess.PIPE,标准输出 = 子进程.PIPE)process.stdin.write("5\n")标准输出 = process.stdout.readline()打印(标准输出)如果标准输出 == "25\n":process.stdin.write("是\n")打印(process.stdout.readline())

$ python2 test.py25好吧,继续……

<小时>

更新

当以这种方式与程序通信时,您必须特别注意应用程序实际编写的内容.最好是在十六进制编辑器中分析输出:

$ chimera --nogui 2>&1 |hexdump -C

请注意,readline [1] 只会读取到下一个换行符 (\n).在您的情况下,您必须至少四次调用 readline 才能获得第一个输出块.

如果您只想读取所有内容直到子进程停止打印,则必须逐字节读取并实现超时.遗憾的是,readreadline 都没有提供这样的超时机制.这可能是因为底层的 read 系统调用 [2] (Linux) 也没有提供.

在 Linux 上,我们可以使用 read_with_timeout()rel="nofollow noreferrer">投票/选择.示例参见 [3].

from select import epoll, EPOLLINdef read_with_timeout(fd, timeout__s):"""从 fd 读取,直到至少 timeout__s 秒没有新数据.这仅适用于 linux >2.5.44."""buf = []e = epoll()e.register(fd, EPOLLIN)为真:ret = e.poll(timeout__s)如果不是 ret 或 ret[0][1] 不是 EPOLLIN:休息buf.append(fd.read(1))返回 '​​'.join(buf)

如果您需要一种可靠的方式在 Windows 和 Linux 下阅读非阻塞,这个答案可能会有所帮助.

<小时>

[1] 来自 python 2 文档:

<块引用>

readline(limit=-1)

从流中读取并返回一行.如果指定了限制,则最多读取限制字节.

二进制文件的行终止符总是 b'\n';对于文本文件,open() 的换行参数可用于选择已识别的行终止符.

[2]来自man 2 read:

<块引用>

#include ssize_t read(int fd, void *buf, size_t count);

[3] 例子

$ 树.├── prog.py└── prog.sh

程序.sh

#!/usr/bin/env bash对于我在 $(seq 3);做回声${RANDOM}"睡觉 1完毕睡 3回声${RANDOM}"

程序.py

# talk_with_example_input.py导入子流程从选择导入 epoll,EPOLLINdef read_with_timeout(fd, timeout__s):"""从 f 读取,直到至少 timeout__s 秒没有新数据.这只适用于 linux >2.5.44."""buf = []e = epoll()e.register(fd, EPOLLIN)为真:ret = e.poll(timeout__s)如果不是 ret 或 ret[0][1] 不是 EPOLLIN:休息buf.append(fd.read(1))返回 '​​'.join(buf)process = subprocess.Popen(["./prog.sh"],标准输入 = subprocess.PIPE,标准输出 = 子进程.PIPE)打印(read_with_timeout(process.stdout,1.5))打印(' -  - -')打印(read_with_timeout(process.stdout,3))

$ python2 prog.py61941450811293-----10506

I'm working with a piece of scientific software called Chimera. For some of the code downstream of this question, it requires that I use Python 2.7.

I want to call a process, give that process some input, read its output, give it more input based on that, etc.

I've used Popen to open the process, process.stdin.write to pass standard input, but then I've gotten stuck trying to get output while the process is still running. process.communicate() stops the process, process.stdout.readline() seems to keep me in an infinite loop.


Here's a simplified example of what I'd like to do:

Let's say I have a bash script called exampleInput.sh.

#!/bin/bash
# exampleInput.sh

# Read a number from the input
read -p 'Enter a number: ' num

# Multiply the number by 5
ans1=$( expr $num \* 5 )

# Give the user the multiplied number
echo $ans1

# Ask the user whether they want to keep going
read -p 'Based on the previous output, would you like to continue? ' doContinue

if [ $doContinue == "yes" ]
then
    echo "Okay, moving on..."
    # [...] more code here [...]
else
    exit 0
fi

Interacting with this through the command line, I'd run the script, type in "5" and then, if it returned "25", I'd type "yes" and, if not, I would type "no".

I want to run a python script where I pass exampleInput.sh "5" and, if it gives me "25" back, then I pass "yes"

So far, this is as close as I can get:

#!/home/user/miniconda3/bin/python2
# talk_with_example_input.py
import subprocess
process = subprocess.Popen(["./exampleInput.sh"], 
                        stdin = subprocess.PIPE,
                        stdout = subprocess.PIPE)
process.stdin.write("5")

answer = process.communicate()[0]

if answer == "25":
    process.stdin.write("yes")
    ## I'd like to print the STDOUT here, but the process is already terminated

But that fails of course, because after `process.communicate()', my process isn't running anymore.


(Just in case/FYI): Actual problem

Chimera is usually a gui-based application to examine protein structure. If you run chimera --nogui, it'll open up a prompt and take input.

I often need to know what chimera outputs before I run my next command. For example, I will often try to generate a protein surface and, if Chimera can't generate a surface, it doesn't break--it just says so through STDOUT. So, in my python script, while I'm looping through many proteins to analyze, I need to check STDOUT to know whether to continue analysis on that protein.

In other use cases, I'll run lots of commands through Chimera to clean up a protein first, and then I'll want to run lots of separate commands to get different pieces of data, and use that data to decide whether to run other commands. I could get the data, close the subprocess, and then run another process, but that would require re-running all of those cleaning up commands each time.

Anyways, those are some of the real-world reasons why I want to be able to push STDIN to a subprocess, read the STDOUT, and still be able to push more STDIN.

Thanks for your time!

解决方案

you don't need to use process.communicate in your example.

Simply read and write using process.stdin.write and process.stdout.read. Also make sure to send a newline, otherwise read won't return. And when you read from stdin, you also have to handle newlines coming from echo.

Note: process.stdout.read will block until EOF.

# talk_with_example_input.py
import subprocess

process = subprocess.Popen(["./exampleInput.sh"], 
                        stdin = subprocess.PIPE,
                        stdout = subprocess.PIPE)

process.stdin.write("5\n")
stdout = process.stdout.readline()
print(stdout)

if stdout == "25\n":
    process.stdin.write("yes\n")
    print(process.stdout.readline())

$ python2 test.py
25

Okay, moving on...



Update

When communicating with an program in that way, you have to pay special attention to what the application is actually writing. Best is to analyze the output in a hex editor:

$ chimera --nogui 2>&1 | hexdump -C

Please note that readline [1] only reads to the next newline (\n). In your case you have to call readline at least four times to get that first block of output.

If you just want to read everything up until the subprocess stops printing, you have to read byte by byte and implement a timeout. Sadly, neither read nor readline does provide such a timeout mechanism. This is probably because the underlying read syscall [2] (Linux) does not provide one either.

On Linux we can write a single-threaded read_with_timeout() using poll / select. For an example see [3].

from select import epoll, EPOLLIN

def read_with_timeout(fd, timeout__s):
    """Reads from fd until there is no new data for at least timeout__s seconds.

    This only works on linux > 2.5.44.
    """
    buf = []
    e = epoll()
    e.register(fd, EPOLLIN)
    while True:
        ret = e.poll(timeout__s)
        if not ret or ret[0][1] is not EPOLLIN:
            break
        buf.append(
            fd.read(1)
        )
    return ''.join(buf)

In case you need a reliable way to read non blocking under Windows and Linux, this answer might be helpful.


[1] from the python 2 docs:

readline(limit=-1)

Read and return one line from the stream. If limit is specified, at most limit bytes will be read.

The line terminator is always b'\n' for binary files; for text files, the newline argument to open() can be used to select the line terminator(s) recognized.

[2] from man 2 read:

#include <unistd.h>

ssize_t read(int fd, void *buf, size_t count);

[3] example

$ tree
.
├── prog.py
└── prog.sh

prog.sh

#!/usr/bin/env bash

for i in $(seq 3); do
  echo "${RANDOM}"
  sleep 1
done

sleep 3
echo "${RANDOM}"

prog.py

# talk_with_example_input.py
import subprocess
from select import epoll, EPOLLIN

def read_with_timeout(fd, timeout__s):
    """Reads from f until there is no new data for at least timeout__s seconds.

    This only works on linux > 2.5.44.
    """
    buf = []
    e = epoll()
    e.register(fd, EPOLLIN)
    while True:
        ret = e.poll(timeout__s)
        if not ret or ret[0][1] is not EPOLLIN:
            break
        buf.append(
            fd.read(1)
        )
    return ''.join(buf)

process = subprocess.Popen(
    ["./prog.sh"],
    stdin = subprocess.PIPE,
    stdout = subprocess.PIPE
)

print(read_with_timeout(process.stdout, 1.5))
print('-----')
print(read_with_timeout(process.stdout, 3))

$ python2 prog.py 
6194
14508
11293

-----
10506


这篇关于Python 子进程:提供标准输入,读取标准输出,然后提供更多标准输入的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆