使用拉宾，卡普搜索多个字符串中的模式 [英] Using Rabin-Karp to search for multiple patterns in a string

查看：247 发布时间：2015/11/30 14:58:57 algorithm hash language-agnostic string-matching rabin-karp

本文介绍了使用拉宾，卡普搜索多个字符串中的模式的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

据对拉宾，卡普字符串匹配算法的百科词条，它可以被用来寻找几个在一个字符串的同时不同的图案，同时仍保持线性的复杂性。很显然，这是很容易做时，所有的模式都是一样的长度，但我还是不明白，我们怎么能对图案同时与不同长度进行搜索时，preserve O（n）的复杂性。是否有人可以提供一些线索这光？

According to the wikipedia entry on Rabin-Karp string matching algorithm, it can be used to look for several different patterns in a string at the same time while still maintaining linear complexity. It is clear that this is easily done when all the patterns are of the same length, but I still don't get how we can preserve O(n) complexity when searching for patterns with differing length simultaneously. Can someone please shed some light on this?

编辑（2011年12月）：

Edit (December 2011):

维基百科的文章进行了更新，不再声称匹配O（n）的不同长度的多种模式。

The wikipedia article has since been updated and no longer claims to match multiple patterns of differing length in O(n).

推荐答案

我不知道这是否是正确的答案，但无论如何：

在构建的散列值，我们可以为您在该组字符串散列的匹配。阿卡，在电流的哈希值。散列函数/ code通常被实现为一个循环，而循环中，我们可以将我们的快速查找。

当然，我们必须跌倒 M 有从琴弦组的最大字符串长度。

更新：维基百科，

I'm not sure if this is the correct answer, but anyway:

While constructing the hash value, we can check for a match in the set of string hashes. Aka, the current hash value. The hash function/code is usually implemented as a loop and inside that loop we can insert our quick look up.

Of course, we must pick m to have the maximum string length from the set of strings.

Update: From Wikipedia,

[...]
for i from 1 to n-m+1
         if hs ∈ hsubs
             if s[i..i+m-1] = a substring with hash hs
                 return i
         hs := hash(s[i+1..i+m]) // <---- calculating current hash
[...]

我们计算的电流的哈希值 M 步骤。在每一步有一个的临时的散列值，我们可以看看（O（1）复杂性）的散列集。所有的哈希值将具有相同的大小，即32位。

We calculate current hash in m steps. On each step there is a temporary hash value that we can look up ( O(1) complexity ) in the set of hashes. All hashes will have the same size, ie 32 bit.

更新2：的摊销（平均）O（n）的时间复杂度？

上面我说的是 M 必须具有最大字符串长度。事实证明，我们可以利用相反。
随着<一href="http://en.wikipedia.org/wiki/Rabin-Karp%5Fstring%5Fsearch%5Falgorithm#Use%5Fof%5Fhashing%5Ffor%5Fshifting%5Fsubstring%5Fsearch"相对=nofollow>散列，用于移动字符串搜索和固定 M 尺寸就可以达到O（n）的复杂性。

如果我们有不同的长度，我们可以设置 M 到最小字符串长度。此外，在散列集，我们不散列与整个字符串，但与第一m个字符的关联。
现在，虽然搜索的文本，我们检查当前哈希散列集，我们检查了匹配相关联的弦。

这种技术将增加误报但平均有O（n）的时间复杂度。

Update 2: an amortized (average) O(n) time complexity ?

Above I said that m must have the maximum string length. It turns out that we can exploit the opposite.
With hashing for shifting substring search and a fixed m size we can achieve O(n) complexity.

If we have variable length strings we can set m to the minimum string length. Additionally, in the set of hashes we don't associate a hash with the whole string but with the first m-characters of it.
Now, while searching the text we check if the current hash is in the hash set and we examine the associated strings for a match.

This technique will increase the false alarms but on average it has O(n) time complexity.

这篇关于使用拉宾，卡普搜索多个字符串中的模式的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用拉宾，卡普搜索多个字符串中的模式 [英] Using Rabin-Karp to search for multiple patterns in a string

问题描述

推荐答案

相关文章

C/C++最新文章

热门教程

热门工具

登录关闭

使用拉宾，卡普搜索多个字符串中的模式 [英] Using Rabin-Karp to search for multiple patterns in a string

问题描述

推荐答案

相关文章

C/C++最新文章

热门教程

热门工具

登录 关闭

登录关闭