算法来查找多个字符串匹配 [英] Algorithm to find multiple string matches
问题描述
我在寻找一个有效的算法寻找所有匹配的大段文字的建议。条款来搜索将被包含在列表中,并且可以具有1000 +可能性。搜索条件可以是1个或多个字。
I'm looking for suggestions for an efficient algorithm for finding all matches in a large body of text. Terms to search for will be contained in a list and can have 1000+ possibilities. The search terms may be 1 or more words.
很显然,我可以通过比对每一个搜索词的文本进行多次传递。不是太有效率。
Obviously I could make multiple passes through the text comparing against each search term. Not too efficient.
我想过订货的搜索字词,结合公共子段。这样,我可以迅速消除大量的术语。语言是C ++,我可以使用boost。
I've thought about ordering the search terms and combining common sub-segments. That way I could eliminate large numbers of terms quickly. Language is C++ and I can use boost.
搜索词的例子可能是世界500强企业名称的列表。
An example of search terms could be a list of Fortune 500 company names.
想法?
推荐答案
这个问题已被深入的研究。奇怪的是,搜索一个模式/串最好的算法,不要轻易推断多串匹配。
Don´t reinvent the wheel
This problem has been intensively researched. Curiously, the best algorithms for searching ONE pattern/string do not extrapolate easily to multi-string matching.
在的grep的家庭实现了非常有效的方式多字符串搜索。如果你可以使用它们作为外部程序,做到这一点。
The "grep" family implement the multi-string search in a very efficient way. If you can use them as external programs, do it.
如果你真的需要实现的算法,我认为最快的方法是复制什么agrep做(agrep擅长多串匹配!)。 这里是源和可执行文件。
In case you really need to implement the algorithm, I think the fastest way is to reproduce what agrep does (agrep excels in multi-string matching!). Here are the source and executable files.
在这里, 你会发现一个文件,介绍了常用的算法,理论背景,和大量的信息和关于字符串匹配指针。
And here you will find a paper describing the algorithms used, the theoretical background, and a lot of information and pointers about string matching.
务必注意:多字符串匹配已经被人们所喜爱克努特,博耶,摩尔,巴埃萨,耶茨,以及其他大量研究。如果你需要一个非常快的算法不犹豫的站在自己宽阔的肩膀。不要重新发明轮子。
A note of caution: multiple-string matching have been heavily researched by people like Knuth, Boyer, Moore, Baeza-Yates, and others. If you need a really fast algorithm don't hesitate on standing on their broad shoulders. Don't reinvent the wheel.
这篇关于算法来查找多个字符串匹配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!