grep -f替代巨大的文件 [英] grep -f alternative for huge files

查看:130
本文介绍了grep -f替代巨大的文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

  grep -F -f file1 file2 

file1是90 MB(250万行,每行一个字)

file2是45 Gb



该命令doesn'实际上不管产生什么,无论我离开它有多久。显然,这超出了grep的范围。



似乎grep无法处理来自 -f 的许多查询,选项。但是,以下命令不会产生所需结果:

  head file1> file3 
grep -F -f file3 file2

我怀疑sed或awk是否会考虑到文件大小,也可以选择合适的替代方案。



我很遗憾替代品...请帮助。学习一些 sql 命令是否值得?这简单吗?任何人都可以指向正确的方向吗?

解决方案

尝试使用LC_ALL = C。它将搜索模式从UTF-8变为ASCII,加速了原始速度的140倍。我有一个26G的文件,要花12个小时才能完成几分钟。
来源:抹去巨大的文件(80GB)以任何方式加快?



所以我做的是:

  LC_ALL = C fgreppattern< input>输出


grep -F -f file1  file2

file1 is 90 Mb (2.5 million lines, one word per line)

file2 is 45 Gb

That command doesn't actually produce anything whatsoever, no matter how long I leave it running. Clearly, this is beyond grep's scope.

It seems grep can't handle that many queries from the -f option. However, the following command does produce the desired result:

head file1  >  file3
grep -F -f file3   file2

I have doubts about whether sed or awk would be appropriate alternatives either, given the file sizes.

I am at a loss for alternatives... please help. Is it worth it to learn some sql commands? Is it easy? Can anyone point me in the right direction?

解决方案

Try using LC_ALL=C . It turns the searching pattern from UTF-8 to ASCII which speeds up by 140 time the original speed. I have a 26G file which would take me around 12 hours to do down to a couple of minutes. Source: Grepping a huge file (80GB) any way to speed it up?

So what I do is:

LC_ALL=C fgrep "pattern" <input >output

这篇关于grep -f替代巨大的文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆