grep -f 最大模式数? [英] grep -f maximum number of patterns?

查看:17
本文介绍了grep -f 最大模式数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在带有 -f 的文本文件上使用 grep 来匹配一长串(10,000)个模式.事实证明 grep 不喜欢这个(谁知道?).一天后,它没有产生任何东西.较小的列表几乎可以立即生效.

I'd like to use grep on a text file with -f to match a long list (10,000) of patterns. Turns out that grep doesn't like this (who, knew?). After a day, it didn't produce anything. Smaller lists work almost instantaneously.

我在想我可以把我的长清单分开并做几次.知道模式列表的最大长度可能是多少吗?

I was thinking I might split my long list up and do it a few times. Any idea what a good maximum length for the pattern list might be?

另外,我对 unix 还很陌生.欢迎使用替代方法.模式列表或搜索词位于纯文本文件中,每行一个.

Also, I'm rather new with unix. Alternative approaches are welcome. The list of patterns, or search terms, are in a plaintext file, one per line.

谢谢大家的指导.

推荐答案

我大约遇到了同样的问题.在 900 万行的文件中搜索 400 万个模式.好像是内存的问题.所以我得到了这个整洁的小工作,它可能比拆分和加入慢,但只需要这一行.

i got the same problem with approx. 4 million patterns to search for in a file with 9 million lines. Seems like it is a problem of RAM. so i got this neat little work around which might be slower than splitting and joining but it just need this one line.

 while read line; do grep $line fileToSearchIn;done < patternFile

我需要使用变通方法,因为 -F 标志不是解决大文件的方法...

I needed to use the work around since the -F flag is no solution for that large files...

对于大文件,这似乎真的很慢.经过更多研究,我发现了 'faSomeRecords' 和来自 Kent 的其他很棒的工具 NGS-editing-Tools

This seems to be really slow for large files. After some more research i found 'faSomeRecords' and really other awesome tools from Kent NGS-editing-Tools

我通过从 550 万个记录文件中提取 200 万个 fasta-rec 自己进行了尝试.花了大约.30 秒..

I tried it on my own by extracting 2 million fasta-rec from 5.5million records file. Took approx. 30 sec..

干杯

直接下载链接

这篇关于grep -f 最大模式数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆