管道有时不会导致立即输出 [英] Piping sometimes does not lead to immediate output

查看:68
本文介绍了管道有时不会导致立即输出的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

虽然A一直在产生输出,但我几次观察到A | B | C可能不会立即产生输出.我不知道这怎么可能.根据我的理解,这三个过程应该同时工作,将它们的输出放到下一个管道(或stdout)中,并在一步完成后取自上一个管道.

I observed a few times now that A | B | C may not lead to immediate output, although A is constantly producing output. I have no idea how this even may be possible. From my understanding all three processes ought to be working on the same time, putting their output into the next pipe (or stdout) and taking from the previous pipe when they are finished with one step.

以下是我目前正在经历的一个示例:

Here's an example where I am currently experiencing that:

tcpflow -ec -i any port 8340 | tee second.flow | grep -i "\(</Manufacturer>\)\|\(</SerialNumber>\)" | awk -F'[<>]' '{print $3}'

tcpflow -ec -i any port 8340 | tee second.flow | grep -i "\(</Manufacturer>\)\|\(</SerialNumber>\)" | awk -F'[<>]' '{print $3}'

应该发生什么:

我在看一个tcp包的端口.如果出现问题,它应该是某种XML格式,我想从这些软件包中复制制造商和序列号.我还想在文本文件"second.flow"中获取完整的,未修改的输出,以供以后参考.

I look at one port for tcp packages. If something comes it should be a certain XML format and I want to grep the Manufacturer and the Serialnumber from these packages. I would also like to get the full, unmodified output in a text file "second.flow", for later reference.

会发生什么:

一切都按需,但不是每10秒获得一次输出(我确定我每10秒获得这些输出!),我必须等待很长时间,然后立即打印很多.就像其中一种工具吞噬缓冲区中的所有内容,并且仅在缓冲区已满时才打印它.我不要我想尽可能快地获得每一行.

Everything as desired, but instead of getting output every 10 seconds (I'm sure I get these outputs every ten seconds!) I have to wait for a long time and then a lot is printed at once. It's like one of the tools gobbles up everything in a buffer and only prints it if the buffer is full. I don't want that. I want to get each line as fast as possible.

如果我将tcpflow ...替换为cat second.flow,它将立即起作用.有人可以描述发生了什么吗?而且如果很明显,还有另一种方法可以达到相同的结果吗?

If I replace tcpflow ... with a cat second.flow it works immediately. Can someone describe what's going on? And in case that it's obvious would there be another way to achieve the same result?

推荐答案

一系列管道中的每一层都可能涉及缓冲;默认情况下,未指定stdout缓冲行为的工具在输出到终端时将使用行缓冲,而在其他任何地方(包括管道传输到另一个程序或文件)时将使用行缓冲.在链式管道中,除了最后一个阶段以外的所有阶段都将看到其输出未到达终端,并会阻塞缓冲区.

Every layer in a series of pipes can involve buffering; by default, tools that don't specify buffering behavior for stdout will use line buffering when outputting to a terminal, and block buffering when outputting anywhere else (including piping to another program or a file). In a chained pipe, all but the last stage will see their output as not going to the terminal, and will block buffer.

因此,在您的情况下,tcpflow可能会不断产生输出,如果这样做,tee应该几乎以相同的速率产生数据.但是grep会将流量限制为a流,并且直到该trick流超过输出缓冲区的大小时才产生输出.它已经执行了过滤,并称为fwriteputsprintf,但是数据正在等待足够的字节在其后建立,然后再将其发送到awk,以减少(昂贵的)系统数量电话.

So in your case, tcpflow might be producing output constantly, and if it's doing so, tee should be producing data almost at the same rate. But grep is going to limit that flow to a trickle, and won't produce output until that trickle exceeds the size of the output buffer. It's already performed the filtering and called fwrite or puts or printf, but the data is waiting for enough bytes to build up behind it before sending it along to awk, to reduce the number of (expensive) system calls.

cat second.flow立即产生输出,因为一旦cat完成输出产生,它就会退出并刷新并关闭其stdout,在此过程中,当每个步骤发现其stdin处于EOF时,该级联都会级联,它退出,冲洗并关闭其stdout. tcpflow不会退出,因此不会发生EOF和冲洗的级联.

cat second.flow produces output immediately because as soon as cat finishes producing output, it exits, flushing and closing its stdout in the process, which cascades, when each step finds its stdin to be at EOF, it exits, flushing and closing its stdout. tcpflow isn't exiting, so the cascade of EOFs and flushing isn't happening.

对于某些程序,通常情况下,可以使用 stdbuf来更改缓冲行为. (或 unbuffer ,尽管该行不能缓冲到平衡效率,并且管道输入存在问题).如果程序正在使用内部缓冲,则该方法可能仍然无法正常工作,但值得一试.

For some programs, in the general case, you can change the buffering behavior by using stdbuf (or unbuffer, though that can't do line buffering to balance efficiency, and has issues with piped input). If the program is using internal buffering, this still might not work, but it's worth a shot.

但是,在您的特定情况下,由于可能是grep导致了中断(通过仅产生一小滴输出滞留在缓冲区中,而tcpflowtee正在产生洪流,而awk已连接到stdout,因此默认情况下缓冲了行),您只需将命令行调整为:

In your specific case, though, since it's likely grep that's causing the interruption (by only producing a trickle of output that is sticking in the buffer, where tcpflow and tee are producing a torrent, and awk is connected to stdout and therefore line buffered by default), you can just adjust your command line to:

tcpflow -ec -i any port 8340 | tee second.flow | grep -i --line-buffered "\(</Manufacturer>\)\|\(</SerialNumber>\)" | awk -F'[<>]' '{print $3}'

至少对于 Linux的grep (不确定该开关是否为标准),这使得grep明确地将其自身的输出缓冲更改为面向行的缓冲,这应消除延迟.如果tcpflow本身没有产生足够的输出来定期刷新(您暗含的意思,但是您可能错了),则可以在其上使用stdbuf(但不能使用tee,这在stdbuf手册页中)注意,请手动更改其缓冲,因此stdbuf不会执行任何操作)以使其行缓冲:

At least for Linux's grep (not sure if switch is standard), that makes grep change its own output buffering to line-oriented buffering explicitly, which should remove the delay. If tcpflow itself is not producing enough output to flush regularly (you implied it did, but you could be wrong), you'd use stdbuf on it (but not tee, which, per stdbuf man page notes, manually changes its buffering, so stdbuf doesn't do anything) to make them line buffered:

stdbuf -oL tcpflow -ec -i any port 8340 | tee second.flow | grep -i --line-buffered "\(</Manufacturer>\)\|\(</SerialNumber>\)" | awk -F'[<>]' '{print $3}'

从注释中更新:即使将awk块缓冲区打印到stdout,也看起来有些味道,即使连接到终端也是如此. 对于mawk(许多基于Debian的默认设置发行版),则可以通过在调用时传递-Winteractive开关来禁用它.另外,要进行移植,可以在每个print之后调用system(""),该 https://www.gnu.org/software/gawk/manual/html_node/I_002fO-Functions.html#index -sidebar_002c-Controlling-Output-Buffering-with-system_0028_0029"rel =" nofollow>可移植地强制在awk 的所有实现上进行输出刷新.可悲的是,明显的fflush()不能移植到awk的较早实现中,但是,如果您只关心现代的awk,则只需使用fflush()使其明显并且大部分都可移植.

Update from comments: It looks like some flavors of awk block buffer prints to stdout, even when connected to a terminal. For mawk (the default on many Debian based distros), you can non-portably disable it by passing the -Winteractive switch at invocation. Alternatively, to work portably, you can just call system("") after each print, which portably forces output flushing on all implementations of awk. Sadly, the obvious fflush() is not portable to older implementations of awk, but if you only care about modern awk, just use fflush() to be obvious and mostly portable.

这篇关于管道有时不会导致立即输出的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆