grep -f最大数量的模式? [英] grep -f maximum number of patterns?
问题描述
我想在-f的文本文件上使用grep来匹配长列表(10,000)的模式。原来,grep不喜欢这个(谁知道?)。一天之后,它没有产生任何东西。较小的清单几乎可以即时工作。
我想我可能会将我的长列表分开并做几次。任何想法模式列表的最大长度可能是多少?
另外,我对unix颇为陌生。可选方法是受欢迎的。模式列表或搜索字词都是纯文本文件,每行一个。
感谢大家的指导。
我大约有同样的问题。在900万行文件中搜索400万种模式。看起来像是RAM的问题。所以我得到了这个整齐的小工作,可能比分裂和加入慢,但它只需要这一行。
while read线;做grep $ line fileToSearchIn; done< patternFile
我需要使用 -F
flag对于那些大文件没有解决方案......
编辑:这似乎对于大文件来说确实很慢。经过一些更多的研究,我发现了'faSomeRecords'以及来自Kent的其他真棒工具 NGS-editing-Tools < a>
我从550万条记录文件中提取了200万个fasta-rec,我自己试了一下。接近约。 30秒..
<>欢呼
编辑:直接下载链接
I'd like to use grep on a text file with -f to match a long list (10,000) of patterns. Turns out that grep doesn't like this (who, knew?). After a day, it didn't produce anything. Smaller lists work almost instantaneously.
I was thinking I might split my long list up and do it a few times. Any idea what a good maximum length for the pattern list might be?
Also, I'm rather new with unix. Alternative approaches are welcome. The list of patterns, or search terms, are in a plaintext file, one per line.
Thank you everyone for your guidance.
i got the same problem with approx. 4 million patterns to search for in a file with 9 million lines. Seems like it is a problem of RAM. so i got this neat little work around which might be slower than splitting and joining but it just need this one line.
while read line; do grep $line fileToSearchIn;done < patternFile
I needed to use the work around since the -F
flag is no solution for that large files...
EDIT: This seems to be really slow for large files. After some more research i found 'faSomeRecords' and really other awesome tools from Kent NGS-editing-Tools
I tried it on my own by extracting 2 million fasta-rec from 5.5million records file. Took approx. 30 sec..
cheers
EDIT: direct download link
这篇关于grep -f最大数量的模式?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!