如何使用subprocess.Popen通过管道连接多个进程? [英] How do I use subprocess.Popen to connect multiple processes by pipes?

查看:347
本文介绍了如何使用subprocess.Popen通过管道连接多个进程?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何使用Python subprocess 模块执行以下shell命令?

How do I execute the following shell command using the Python subprocess module?

echo "input data" | awk -f script.awk | sort > outfile.txt

输入数据将来自字符串,因此我实际上不需要echo.我已经走了这么远,还有谁能解释我也如何通过sort进行管道传输?

The input data will come from a string, so I don't actually need echo. I've got this far, can anyone explain how I get it to pipe through sort too?

p_awk = subprocess.Popen(["awk","-f","script.awk"],
                          stdin=subprocess.PIPE,
                          stdout=file("outfile.txt", "w"))
p_awk.communicate( "input data" )

更新:请注意,尽管下面接受的答案并没有真正回答所提出的问题,但我相信S.Lott是正确的,最好避免首先解决该问题!

UPDATE: Note that while the accepted answer below doesn't actually answer the question as asked, I believe S.Lott is right and it's better to avoid having to solve that problem in the first place!

推荐答案

您将对以下内容感到满意.

You'd be a little happier with the following.

import subprocess

awk_sort = subprocess.Popen( "awk -f script.awk | sort > outfile.txt",
    stdin=subprocess.PIPE, shell=True )
awk_sort.communicate( b"input data\n" )

将部分工作委托给Shell.让它通过管道连接两个进程.

Delegate part of the work to the shell. Let it connect two processes with a pipeline.

将'script.awk'重写为Python会更加快乐,省去了awk和管道.

You'd be a lot happier rewriting 'script.awk' into Python, eliminating awk and the pipeline.

修改.提示awk没有帮助的一些原因.

Edit. Some of the reasons for suggesting that awk isn't helping.

[有太多原因需要通过评论回复.]

[There are too many reasons to respond via comments.]

  1. Awk正在添加一个无意义的步骤.没有Awk的处理过程,Python并没有独特之处.

  1. Awk is adding a step of no significant value. There's nothing unique about awk's processing that Python doesn't handle.

对于大量数据,从awk到sort的流水线处理可能会缩短处理时间.对于短数据集,它没有明显的好处.快速测量awk >file ; sort fileawk | sort将显示并发帮助.使用sort时,它很少有帮助,因为sort不是一次性过滤器.

The pipelining from awk to sort, for large sets of data, may improve elapsed processing time. For short sets of data, it has no significant benefit. A quick measurement of awk >file ; sort file and awk | sort will reveal of concurrency helps. With sort, it rarely helps because sort is not a once-through filter.

对Python进行排序"处理(而不是对awk进行Python排序")的简单性可以防止在此处询问确切的问题.

The simplicity of "Python to sort" processing (instead of "Python to awk to sort") prevents the exact kind of questions being asked here.

Python(虽然比awk更为复杂)也很明显,其中awk具有某些对新手来说是不透明的隐式规则,并且会使非专业人士感到困惑.

Python -- while wordier than awk -- is also explicit where awk has certain implicit rules that are opaque to newbies, and confusing to non-specialists.

Awk(与shell脚本本身一样)添加了Another Programming Language.如果所有这些都可以用一种语言(Python)完成,则无需使用shell,而awk编程可以消除两种编程语言,从而使人们可以专注于任务的价值创造部分.

Awk (like the shell script itself) adds Yet Another Programming language. If all of this can be done in one language (Python), eliminating the shell and the awk programming eliminates two programming languages, allowing someone to focus on the value-producing parts of the task.

最重要的是:awk不能增加可观的价值.在这种情况下,awk是净成本;它增加了足够的复杂性,因此有必要提出这个问题.删除awk将获得净收益.

Bottom line: awk can't add significant value. In this case, awk is a net cost; it added enough complexity that it was necessary to ask this question. Removing awk will be a net gain.

侧边栏,为什么构建管道(a | b)如此困难.

Sidebar Why building a pipeline (a | b) is so hard.

当外壳面对a | b时,它必须执行以下操作.

When the shell is confronted with a | b it has to do the following.

  1. 派生原始shell的子进程.最终将成为b.

  1. Fork a child process of the original shell. This will eventually become b.

构建一个os管道. (不是Python subprocess.PIPE),而是调用os.pipe(),它返回两个通过公共缓冲区连接的新文件描述符.此时,该进程具有来自其父进程的stdin,stdout,stderr,以及一个将是"a's stdout"和"b's stdin"的文件.

Build an os pipe. (not a Python subprocess.PIPE) but call os.pipe() which returns two new file descriptors that are connected via common buffer. At this point the process has stdin, stdout, stderr from its parent, plus a file that will be "a's stdout" and "b's stdin".

分叉一个孩子.子代将其标准输出替换为新的a的标准输出.执行a流程.

Fork a child. The child replaces its stdout with the new a's stdout. Exec the a process.

b子项关闭,将其stdin替换为新的b的stdin.执行b流程.

The b child closes replaces its stdin with the new b's stdin. Exec the b process.

b子项等待a结束.

The b child waits for a to complete.

父母正在等待b完成.

我认为可以递归地使用上面的代码来生成a | b | c,但是您必须隐式地括住长管道,将它们视为a | (b | c).

I think that the above can be used recursively to spawn a | b | c, but you have to implicitly parenthesize long pipelines, treating them as if they're a | (b | c).

由于Python具有os.pipe()os.exec()os.fork(),并且您可以替换sys.stdinsys.stdout,因此有一种方法可以在纯Python中完成上述操作.确实,您可以使用os.pipe()subprocess.Popen找出一些快捷方式.

Since Python has os.pipe(), os.exec() and os.fork(), and you can replace sys.stdin and sys.stdout, there's a way to do the above in pure Python. Indeed, you may be able to work out some shortcuts using os.pipe() and subprocess.Popen.

但是,将操作委托给Shell更为容易.

However, it's easier to delegate that operation to the shell.

这篇关于如何使用subprocess.Popen通过管道连接多个进程?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆