最好的字符串搜索算法 [英] Best string search algorithm around
问题描述
我有一个代码,在其中我比较一个大数据,例如一个网页的源文件中的某些单词。
可以有两种情况:
-
如果我有大量的词与源进行比较,在这种情况下,对于正常的字符串搜索算法,它将需要一个字,与数据进行比较,取下一个,并与数据等等,直到所有完成。
- 我在文件中只有几个字,正常的字符串搜索会正常,但仍然需要
什么算法最好?我知道Boyer-Moore和Rabin-Karp搜索算法。
虽然Boyer-Moore的搜索似乎很快,但我也喜欢其他算法的名称和比较。
两种情况下,我想你可能想要构建一个patricia trie(也称为基数树)。最重要的是,查找时间将是O(k),其中k是字符串中字符串的最大长度。
I have a code where in i compare a large data, say a source of a web page against some words in a file. What is the best algorithm to be used?
There can be 2 scenarios:
If I have a large amount of words to compare against the source, In which case, for a normal string search algorithm, it would have to take a word, compare against the data, take the next and compare against the data and so on until all is complete.
I have only a couple of words in the file and the normal string search would be ok, but still need to reduce the time as much as possible.
What algorithm is best? I know about Boyer-Moore and also Rabin-Karp search algorithms. Although Boyer-Moore search seems fast, I would also like names of other algorithms and their comparisons.
In both cases, I think you probably want to construct a patricia trie (also called radix tree). Most importantly, lookup time would be O(k), where k is the max length of a string in the trie.
这篇关于最好的字符串搜索算法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!