grep如何运行如此之快? [英] How does grep run so fast?

查看:94
本文介绍了grep如何运行如此之快?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对shell中的GREP的功能感到非常惊讶,之前我曾经在java中使用substring方法,但现在我使用GREP,它在几秒钟内执行,它比我使用的java代码快得多(据我的经验,我可能是错误的)尽管如此,我还没有弄清楚它是如何发生的?网上也没有太多可用的东西。



任何人都可以帮我解决这个问题吗?

解决方案

假设您的问题专门针对 GNU grep 。以下是作者Mike Haertel的一篇文章:
$ b


GNU grep速度很快,因为它避免了每个输入字节。



GNU grep速度很快,因为它对每个
BYTE执行了很多指令,即
不会看
$ b

GNU grep使用了着名的Boyer-Moore算法,该算法首先在目标字符串的最后一个字母中查找
,并使用查找表对
进行了说明远远超前它可以跳过输入,只要它发现
不匹配的字符。



GNU grep还展开Boyer-Moore的内部循环,并设置增加
Boyer-Moore增量表的条目,以便它不需要
在每个展开的步骤中执行循环退出测试。这样做的结果是
,在这个极限内,GNU grep平均每个输入字节的实际查看次数少于3个x86指令
(并且它完全跳过了许多
字节)。



GNU grep使用原始的Unix输入系统调用,并在读取数据后避免复制数据
。而且,GNU grep AVOIDS将输入分为
行。寻找换行符会使grep下降几倍,因为要找到换行符,它必须查看每个字节的



因此,GNU grep不是使用面向行的输入,而是将原始数据读入
a大缓冲区,使用Boyer-Moore搜索缓冲区,并且只有当
找到匹配时才会去查找边界换行
(某些命令行选项,如
-n禁用此优化。)

这个答案是一个子集从此处获取的信息。

I am really amazed by the functionality of GREP in shell, earlier I used to use substring method in java but now I use GREP for it and it executes in a matter of seconds, it is blazingly faster than java code that I used to write.(according to my experience I might be wrong though)

That being said I have not been able to figure out how it is happening? there is also not much available on the web.

Can anyone help me with this?

解决方案

Assuming your question regards GNU grep specifically. Here's a note from the author, Mike Haertel:

GNU grep is fast because it AVOIDS LOOKING AT EVERY INPUT BYTE.

GNU grep is fast because it EXECUTES VERY FEW INSTRUCTIONS FOR EACH BYTE that it does look at.

GNU grep uses the well-known Boyer-Moore algorithm, which looks first for the final letter of the target string, and uses a lookup table to tell it how far ahead it can skip in the input whenever it finds a non-matching character.

GNU grep also unrolls the inner loop of Boyer-Moore, and sets up the Boyer-Moore delta table entries in such a way that it doesn't need to do the loop exit test at every unrolled step. The result of this is that, in the limit, GNU grep averages fewer than 3 x86 instructions executed for each input byte it actually looks at (and it skips many bytes entirely).

GNU grep uses raw Unix input system calls and avoids copying data after reading it. Moreover, GNU grep AVOIDS BREAKING THE INPUT INTO LINES. Looking for newlines would slow grep down by a factor of several times, because to find the newlines it would have to look at every byte!

So instead of using line-oriented input, GNU grep reads raw data into a large buffer, searches the buffer using Boyer-Moore, and only when it finds a match does it go and look for the bounding newlines (Certain command line options like -n disable this optimization.)

This answer is a subset of the information taken from here.

这篇关于grep如何运行如此之快?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆