如何使用 subprocess.Popen 通过管道连接多个进程? [英] How do I use subprocess.Popen to connect multiple processes by pipes?

查看:44
本文介绍了如何使用 subprocess.Popen 通过管道连接多个进程?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何使用 Python subprocess 模块?

How do I execute the following shell command using the Python subprocess module?

echo "input data" | awk -f script.awk | sort > outfile.txt

输入数据将来自一个字符串,所以我实际上不需要 echo.我已经到了这一步,谁能解释一下我是如何让它通过 sort 进行管道传输的?

The input data will come from a string, so I don't actually need echo. I've got this far, can anyone explain how I get it to pipe through sort too?

p_awk = subprocess.Popen(["awk","-f","script.awk"],
                          stdin=subprocess.PIPE,
                          stdout=file("outfile.txt", "w"))
p_awk.communicate( "input data" )

更新:请注意,虽然下面接受的答案实际上并没有回答所问的问题,但我相信 S.Lott 是对的,最好避免首先解决该问题!

UPDATE: Note that while the accepted answer below doesn't actually answer the question as asked, I believe S.Lott is right and it's better to avoid having to solve that problem in the first place!

推荐答案

以下内容会让您更满意.

You'd be a little happier with the following.

import subprocess

awk_sort = subprocess.Popen( "awk -f script.awk | sort > outfile.txt",
    stdin=subprocess.PIPE, shell=True )
awk_sort.communicate( b"input data
" )

将部分工作委托给shell.让它用管道连接两个进程.

Delegate part of the work to the shell. Let it connect two processes with a pipeline.

将script.awk"重写到 Python 中会更开心,从而消除 awk 和管道.

You'd be a lot happier rewriting 'script.awk' into Python, eliminating awk and the pipeline.

编辑.暗示 awk 没有帮助的一些原因.

Edit. Some of the reasons for suggesting that awk isn't helping.

[评论的理由太多了.]

[There are too many reasons to respond via comments.]

  1. awk 增加了一个没有意义的步骤.awk 的处理没有任何独特之处,Python 无法处理.

  1. Awk is adding a step of no significant value. There's nothing unique about awk's processing that Python doesn't handle.

对于大量数据,从 awk 到排序的流水线可能会缩短处理时间.对于短数据集,它没有显着的好处.awk >file 的快速测量;排序文件awk |sort 将揭示并发帮助.对于排序,它很少有帮助,因为排序不是一次性过滤器.

The pipelining from awk to sort, for large sets of data, may improve elapsed processing time. For short sets of data, it has no significant benefit. A quick measurement of awk >file ; sort file and awk | sort will reveal of concurrency helps. With sort, it rarely helps because sort is not a once-through filter.

Python 进行排序"处理(而不是Python 进行 awk 进行排序")的简单性阻止了此处提出的确切类型的问题.

The simplicity of "Python to sort" processing (instead of "Python to awk to sort") prevents the exact kind of questions being asked here.

Python -- 虽然比 awk 更冗长 -- 也很明确,其中 awk 有某些隐含的规则,这些规则对新手来说是不透明的,而对非专业人士来说是混乱的.

Python -- while wordier than awk -- is also explicit where awk has certain implicit rules that are opaque to newbies, and confusing to non-specialists.

Awk(就像 shell 脚本本身一样)添加了另一种编程语言.如果所有这一切都可以用一种语言 (Python) 完成,那么消除 shell 和 awk 编程就消除了两种编程语言,使人们可以专注于任务的价值产生部分.

Awk (like the shell script itself) adds Yet Another Programming language. If all of this can be done in one language (Python), eliminating the shell and the awk programming eliminates two programming languages, allowing someone to focus on the value-producing parts of the task.

底线:awk 不能增加显着的价值.在这种情况下,awk 是净成本;它增加了足够的复杂性,以至于有必要提出这个问题.删除 awk 将获得净收益.

Bottom line: awk can't add significant value. In this case, awk is a net cost; it added enough complexity that it was necessary to ask this question. Removing awk will be a net gain.

侧边栏 为什么构建管道 (a | b) 如此困难.

Sidebar Why building a pipeline (a | b) is so hard.

当 shell 遇到 a |b 它必须执行以下操作.

When the shell is confronted with a | b it has to do the following.

  1. fork 原始 shell 的子进程.这最终会变成 b.

  1. Fork a child process of the original shell. This will eventually become b.

构建一个操作系统管道.(不是 Python subprocess.PIPE)而是调用 os.pipe() 返回两个通过公共缓冲区连接的新文件描述符.此时,该进程具有来自其父级的 stdin、stdout、stderr,以及一个将是a 的 stdout"和b 的 stdin"的文件.

Build an os pipe. (not a Python subprocess.PIPE) but call os.pipe() which returns two new file descriptors that are connected via common buffer. At this point the process has stdin, stdout, stderr from its parent, plus a file that will be "a's stdout" and "b's stdin".

叉一个孩子.孩子用新的标准输出替换它的标准输出.执行 a 进程.

Fork a child. The child replaces its stdout with the new a's stdout. Exec the a process.

b 子节点关闭用新 b 的标准输入替换它的标准输入.执行 b 进程.

The b child closes replaces its stdin with the new b's stdin. Exec the b process.

b 子进程等待 a 完成.

The b child waits for a to complete.

父进程正在等待 b 完成.

The parent is waiting for b to complete.

我认为上面可以递归使用来生成 a |乙 |c,但是您必须隐式地将长管道括起来,将它们视为 a |(b | c).

I think that the above can be used recursively to spawn a | b | c, but you have to implicitly parenthesize long pipelines, treating them as if they're a | (b | c).

由于 Python 有 os.pipe()os.exec()os.fork(),你可以替换 sys.stdinsys.stdout,有一种方法可以在纯 Python 中完成上述操作.实际上,您可以使用 os.pipe()subprocess.Popen 找出一些快捷方式.

Since Python has os.pipe(), os.exec() and os.fork(), and you can replace sys.stdin and sys.stdout, there's a way to do the above in pure Python. Indeed, you may be able to work out some shortcuts using os.pipe() and subprocess.Popen.

但是,将该操作委托给 shell 会更容易.

However, it's easier to delegate that operation to the shell.

这篇关于如何使用 subprocess.Popen 通过管道连接多个进程?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆