在unix中查找文件中字符串的快速方法 [英] Fast way to find string in file in unix

查看:173
本文介绍了在unix中查找文件中字符串的快速方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在 unix 文件中查找字符串模式.我使用以下命令:

I want to find string pattern in file in unix. I use below command:

$grep 2005057488 filename

但是文件包含数百万行,我有很多这样的文件.除了 grep 之外,获取模式的最快方法是什么.

But file contains millions of lines and i have many such files. What is fastest way to get pattern other than grep.

推荐答案

grep 通常尽可能快.它专为一件事而设计——而且它做得很好.您可以在此处了解原因.

grep is generally as fast as it gets. It's designed to one thing and one thing only - and it does what it does very well. You can read why here.

但是,为了加快速度,您可以尝试一些方法.首先,看起来您要查找的模式是固定字符串.幸运的是,grep 有一个固定字符串"选项:

However, to speed things up there are a couple of things you could try. Firstly, it looks like the pattern you're looking for is a fixed string. Fortunately, grep has a 'fixed-strings' option:

-F, --fixed-strings
       Interpret PATTERN as a list of fixed strings, separated by newlines, any of which is to be matched. (-F is specified by POSIX.)

其次,由于 grepUTF-8 上通常很慢,您可以尝试通过设置环境 LANG=C 来禁用国家语言支持 (NLS).因此,您可以尝试这种混合物:

Secondly, because grep is generally pretty slow on UTF-8, you could try disabling national language support (NLS) by setting the environment LANG=C. Therefore, you could try this concoction:

LANG=C grep -F "2005057488" file

第三,您的问题并不清楚,但是如果您只尝试查找文件中是否存在某项内容,您还可以尝试添加最大次数来查找模式.因此,当-m 1时,grep会在第一次出现后立即退出.您的命令现在可能如下所示:

Thirdly, it wasn't clear in your question, but if your only trying to find if something exists once in your file, you could also try adding a maximum number of times to find the pattern. Therefore, when -m 1, grep will quit immediately after the first occurrence is found. Your command could now look like this:

LANG=C grep -m 1 -F "2005057488" file

最后,如果你有一个多核 CPU,你可以试试 GNU 并行.它甚至带有关于如何将其与grep一起使用的说明.每个核心运行 1.5 个作业并向 grep 提供 1000 个参数:

Finally, if you have a multicore CPU, you could give GNU parallel a go. It even comes with an explanation of how to use it with grep. To run 1.5 jobs per core and give 1000 arguments to grep:

find . -type f | parallel -k -j150% -n 1000 -m grep -H -n STRING {}

grep并行使用一个大文件--pipe:

To grep a big file in parallel use --pipe:

< bigfile parallel --pipe grep STRING

根据您的磁盘和 CPU,读取更大的块可能会更快:

Depending on your disks and CPUs it may be faster to read larger blocks:

< bigfile parallel --pipe --block 10M grep STRING

这篇关于在unix中查找文件中字符串的快速方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆