grep 怎么跑得这么快? [英] How does grep run so fast?

查看:15
本文介绍了grep 怎么跑得这么快?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我真的对 shell 中的 GREP 功能感到惊讶,之前我曾经在 java 中使用 substring 方法,但现在我使用 GREP 并且它在几秒钟内执行,它比我使用的 java 代码快得多写.(根据我的经验,我可能是错的)

I am really amazed by the functionality of GREP in shell, earlier I used to use substring method in java but now I use GREP for it and it executes in a matter of seconds, it is blazingly faster than java code that I used to write.(according to my experience I might be wrong though)

话虽如此,我一直无法弄清楚它是如何发生的?网络上也没有太多可用的东西.

That being said I have not been able to figure out how it is happening? there is also not much available on the web.

谁能帮我解决这个问题?

Can anyone help me with this?

推荐答案

假设您的问题特别针对 GNU grep.这是作者 Mike Haertel 的注释:

Assuming your question regards GNU grep specifically. Here's a note from the author, Mike Haertel:

GNU grep 速度很快,因为它避免查看每个输入字节.

GNU grep is fast because it AVOIDS LOOKING AT EVERY INPUT BYTE.

GNU grep 速度很快,因为它只为每个指令执行很少的指令字节它确实看看.

GNU grep is fast because it EXECUTES VERY FEW INSTRUCTIONS FOR EACH BYTE that it does look at.

GNU grep 使用了著名的 Boyer-Moore 算法,它首先看起来为目标字符串的最后一个字母,并使用查找表告诉它可以在输入中跳过多远,只要它找到不匹配的字符.

GNU grep uses the well-known Boyer-Moore algorithm, which looks first for the final letter of the target string, and uses a lookup table to tell it how far ahead it can skip in the input whenever it finds a non-matching character.

GNU grep 还展开 Boyer-Moore 的内部循环,并设置Boyer-Moore delta 表条目以不需要的方式在每个展开的步骤中进行循环退出测试.这样做的结果是也就是说,在极限情况下,GNU grep 平均少于 3 个 x86 指令为它实际查看的每个输入字节执行(并跳过许多字节完全).

GNU grep also unrolls the inner loop of Boyer-Moore, and sets up the Boyer-Moore delta table entries in such a way that it doesn't need to do the loop exit test at every unrolled step. The result of this is that, in the limit, GNU grep averages fewer than 3 x86 instructions executed for each input byte it actually looks at (and it skips many bytes entirely).

GNU grep 使用原始 Unix 输入系统调用并避免复制数据阅读后.此外,GNU grep 避免将输入中断线.寻找换行符会减慢 grep 的速度好几次,因为要找到换行符,它必须查看每个字节!

GNU grep uses raw Unix input system calls and avoids copying data after reading it. Moreover, GNU grep AVOIDS BREAKING THE INPUT INTO LINES. Looking for newlines would slow grep down by a factor of several times, because to find the newlines it would have to look at every byte!

因此,GNU grep 不是使用面向行的输入,而是将原始数据读入一个大缓冲区,使用 Boyer-Moore 搜索缓冲区,并且仅当它找到一个匹配项并寻找边界换行符(某些命令行选项,如-n 禁用此优化.)

So instead of using line-oriented input, GNU grep reads raw data into a large buffer, searches the buffer using Boyer-Moore, and only when it finds a match does it go and look for the bounding newlines (Certain command line options like -n disable this optimization.)

此答案是从此处获取的信息的子集.

This answer is a subset of the information taken from here.

这篇关于grep 怎么跑得这么快?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆