grep -f最大数量的模式? [英] grep -f maximum number of patterns?

查看:151
本文介绍了grep -f最大数量的模式?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在-f的文本文件上使用grep来匹配长列表(10,000)的模式。原来,grep不喜欢这个(谁知道?)。一天之后,它没有产生任何东西。较小的清单几乎可以即时工作。



我想我可能会将我的长列表分开并做几次。任何想法模式列表的最大长度可能是多少?

另外,我对unix颇为陌生。可选方法是受欢迎的。模式列表或搜索字词都是纯文本文件,每行一个。



感谢大家的指导。

解决方案

我大约有同样的问题。在900万行文件中搜索400万种模式。看起来像是RAM的问题。所以我得到了这个整齐的小工作,可能比分裂和加入慢,但它只需要这一行。

  while read线;做grep $ line fileToSearchIn; done< patternFile 

我需要使用 -F flag对于那些大文件没有解决方案......



编辑:这似乎对于大文件来说确实很慢。经过一些更多的研究,我发现了'faSomeRecords'以及来自Kent的其他真棒工具 NGS-editing-Tools < a>



我从550万条记录文件中提取了200万个fasta-rec,我自己试了一下。接近约。 30秒..

<>欢呼



编辑:直接下载链接


I'd like to use grep on a text file with -f to match a long list (10,000) of patterns. Turns out that grep doesn't like this (who, knew?). After a day, it didn't produce anything. Smaller lists work almost instantaneously.

I was thinking I might split my long list up and do it a few times. Any idea what a good maximum length for the pattern list might be?

Also, I'm rather new with unix. Alternative approaches are welcome. The list of patterns, or search terms, are in a plaintext file, one per line.

Thank you everyone for your guidance.

解决方案

i got the same problem with approx. 4 million patterns to search for in a file with 9 million lines. Seems like it is a problem of RAM. so i got this neat little work around which might be slower than splitting and joining but it just need this one line.

 while read line; do grep $line fileToSearchIn;done < patternFile

I needed to use the work around since the -F flag is no solution for that large files...

EDIT: This seems to be really slow for large files. After some more research i found 'faSomeRecords' and really other awesome tools from Kent NGS-editing-Tools

I tried it on my own by extracting 2 million fasta-rec from 5.5million records file. Took approx. 30 sec..

cheers

EDIT: direct download link

这篇关于grep -f最大数量的模式?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆