巴什管道处理 [英] Bash Pipe Handling

查看:140
本文介绍了巴什管道处理的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有谁知道的bash如何处理通过管道发送数据?

Does anyone know how bash handles sending data through pipes?

cat file.txt | tail -20

这是否命令打印file.txt的所有内容到缓冲区中,然后由阅读尾巴?还是这个命令,比如,一行打印file.txt的行的内容,然后在每行尾工艺暂停,然后再要求更多的数据?

Does this command print all the contents of file.txt into a buffer, which is then read by tail? Or does this command, say, print the contents of file.txt line by line, and then pause at each line for tail to process, and then ask for more data?

请问的原因是,我正在写的数据的某些块,其中,一个操作的输出是发送关闭作为下一操作的输入基本上执行操作的序列的嵌入式装置上的程序。我想知道的linux(bash)的如何处理这个,所以请给我一个普遍的答案,而不是具体是什么,当我运行情况。猫file.txt的|尾-20

The reason I ask is that I'm writing a program on an embedded device that basically performs a sequence of operations on some chunk of data, where the output of one operation is send off as the input of the next operation. I would like to know how linux (bash) handles this so please give me a general answer, not specifically what happens when I run "cat file.txt | tail -20".

感谢您提前为您的答复!

Thank you in advance for your responses!

编辑:Shog9指出,有关维基百科的文章,这并不会直接导致了我的文章,但它帮我找到这个:<一href=\"http://en.wikipedia.org/wiki/Pipeline_%28Unix%29#Implementation\">http://en.wikipedia.org/wiki/Pipeline_%28Unix%29#Implementation这确实有我一直在寻找的信息。

Shog9 pointed out a relevant Wikipedia Article, this didn't lead me directly to the article but it helped me find this: http://en.wikipedia.org/wiki/Pipeline_%28Unix%29#Implementation which did have the information I was looking for.

我是不会让自己清楚遗憾。当然,您使用的是管道,当然,您使用的命令的各部分的stdin和stdout。我曾以为这是太明显状态。

I'm sorry for not making myself clear. Of course you're using a pipe and of course you're using stdin and stdout of the respective parts of the command. I had assumed that was too obvious to state.

什么我问的是这是如何处理/实施。由于两个程序不能同时运行,如何从标准到stdout发送的数据?如果第一个程序产生的数据比第二方案显著快会发生什么?系统是否刚运行的第一个命令,直到它的终止或它的标准输出缓冲器是满的,然后在一个循环中,直到没有更多的数据是留给待处理或移动到下一个节目,等是有一个更复杂的机构?

What I'm asking is how this is handled/implemented. Since both programs cannot run at once, how is data sent from stdin to stdout? What happens if the first program generates data significantly faster than the second program? Does the system just run the first command until either it's terminated or it's stdout buffer is full, and then move on to the next program, and so on in a loop until no more data is left to be processed or is there a more complicated mechanism?

推荐答案

我决定写一个稍微详细的解释。

I decided to write a slightly more detailed explanation.

魔术师在这里在于操作系统。这两个程序都启动在大致相同的时间,并在同一时间运行(操作系统为它们分配的处理器上运行的时间片),其他所有同时运行过程中您的计算机(包括终端应用程序和内核) 。因此,任何数据被传递前,过程是做什么的初始化必要的。在你的榜样,尾巴解析-20的说法和猫是解析file.txt的的说法,并打开该文件。在某些时候,尾巴会得到的地步,它需要输入,它会告诉它正在等待输入操作系统。在其他一些点(之前或之后,也没关系)猫将开始将数据传递到使用标准输出操作系统。此进入在操作系统中的缓冲器。下一次尾获取处理器上的时间片后一些数据已被投入由猫的缓冲,这将检索该数据的一些量(或全部),其离开缓冲器的操作系统。当缓冲器是空的,在某些时候尾将不得不等待猫以输出更多的数据。如果猫是输出比尾部处理得更快的数据,缓冲区将扩大。猫最终将完成输出数据,但尾巴仍然会处理,让猫将关闭和尾部将处理缓冲区中的所有剩余的数据。操作系统将信号尾巴时,他们是与EOF没有更多的输入数据。尾将处理的剩余数据。在这种情况下,尾部可能只是收到全部数据转换成的20行循环缓冲器,并且当它是由操作系统,有没有更多的输入的数据信号,它然后转储最后二十行自身标准输出,这刚刚被显示在终端。由于尾巴是一个比猫更简单的程序,它可能会花大部分的时间等待猫将数据放入缓冲区中。

The "magic" here lies in the operating system. Both programs do start up at roughly the same time, and run at the same time (the operating system assigns them slices of time on the processor to run) as every other simultaneously running process on your computer (including the terminal application and the kernel). So, before any data gets passed, the processes are doing whatever initialization necessary. In your example, tail is parsing the '-20' argument and cat is parsing the 'file.txt' argument and opening the file. At some point tail will get to the point where it needs input and it will tell the operating system that it is waiting for input. At some other point (either before or after, it doesn't matter) cat will start passing data to the operating system using stdout. This goes into a buffer in the operating system. The next time tail gets a time slice on the processor after some data has been put into the buffer by cat, it will retrieve some amount of that data (or all of it) which leaves the buffer on the operating system. When the buffer is empty, at some point tail will have to wait for cat to output more data. If cat is outputting data much faster than tail is handling it, the buffer will expand. cat will eventually be done outputting data, but tail will still be processing, so cat will close and tail will process all remaining data in the buffer. The operating system will signal tail when their is no more incoming data with an EOF. Tail will process the remaining data. In this case, tail is probably just receiving all the data into a circular buffer of 20 lines, and when it is signalled by the operating system that there is no more incoming data, it then dumps the last twenty lines to its own stdout, which just gets displayed in the terminal. Since tail is a much simpler program than cat, it will likely spend most of the time waiting for cat to put data into the buffer.

在具有多个处理器的系统,这两个方案将不仅是在同一个处理器核心共享交替的时间段,但在同一时间独立内核运行的可能性。

On a system with multiple processors, the two programs will not just be sharing alternating time slices on the same processor core, but likely running at the same time on separate cores.

要进入更详细一点,如果你打开​​一些程序监视器(特定于操作系统)像'顶'在Linux中,你会看到正在运行的进程的完整列表,其中大部分是有效利用的0%处理器。大多数应用程序,除非它们的数据运算,耗费了大量的时间无所事事。这是一件好事,因为它允许其他进程根据自己的需要有处理器自由访问。这在基本上三种方式来完成。一个进程可以得到一个睡眠(n)的风格指令,其中它基本上是告诉内核给它另外一个时间片一起工作之前要等待n毫秒。最常见的一个程序需要等待从另一个程序的东西,像尾巴在等待更多的数据进入缓冲区。在这种情况下,当更多的数据是可用的操作系统将唤醒过程。最后,内核可以preempt在执行过程中的方法,提供了一些处理器时间片到其它过程。 猫和尾是简单的程序。在该例子中,尾花费大部分它的时间等待更多的数据在缓冲,和猫花费大部分它的时间等待操作系统从硬盘中检索数据。瓶颈是该文件被存储在物理介质的速度(或缓慢)。当你第一次运行这个命令,你可能会检测到明显的延迟是花费在磁盘驱动器上的读取头,寻求在哪里file.txt的是硬盘位置的时间。如果你运行该命令第二次,操作系统将可能有file.txt的缓存在内存中的内容,你将不会看到任何明显的延迟(除非file.txt的是非常大的,或该文件已不再缓存。)

To get into a little more detail, if you open some kind of process monitor (operating system specific) like 'top' in Linux you will see a whole list of running processes, most of which are effectively using 0% of the processor. Most applications, unless they are crunching data, spend most of their time doing nothing. This is good, because it allows other processes to have unfettered access to the processor according to their needs. This is accomplished in basically three ways. A process could get to a sleep(n) style instruction where it basically tells the kernel to wait n milliseconds before giving it another time slice to work with. Most commonly a program needs to wait for something from another program, like 'tail' waiting for more data to enter the buffer. In this case the operating system will wake up the process when more data is available. Lastly, the kernel can preempt a process in the middle of execution, giving some processor time slices to other processes. 'cat' and 'tail' are simple programs. In this example, tail spends most of it's time waiting for more data on the buffer, and cat spends most of it's time waiting for the operating system to retrieve data from the harddrive. The bottleneck is the speed (or slowness) of the physical medium that the file is stored on. That perceptible delay you might detect when you run this command for the first time is the time it takes for the read heads on the disk drive to seek to the position on the harddrive where 'file.txt' is. If you run the command a second time, the operating system will likely have the contents of file.txt cached in memory, and you will not likely see any perceptible delay (unless file.txt is very large, or the file is no longer cached.)

您在计算机上执行的大多数操作是IO的约束,这就是说,你通常是等待数据来自您的硬盘,或者从网络设备等。

Most operations you do on your computer are IO bound, which is to say that you are usually waiting for data to come from your harddrive, or from a network device, etc.

这篇关于巴什管道处理的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆