使用拉宾,卡普搜索多个字符串中的模式 [英] Using Rabin-Karp to search for multiple patterns in a string

查看:247
本文介绍了使用拉宾,卡普搜索多个字符串中的模式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

据对拉宾,卡普字符串匹配算法的百科词条,它可以被用来寻找几个在一个字符串的同时不同的图案,同时仍保持线性的复杂性。很显然,这是很容易做时,所有的模式都是一样的长度,但我还是不明白,我们怎么能对图案同时与不同长度进行搜索时,preserve O(n)的复杂性。是否有人可以提供一些线索这光?

According to the wikipedia entry on Rabin-Karp string matching algorithm, it can be used to look for several different patterns in a string at the same time while still maintaining linear complexity. It is clear that this is easily done when all the patterns are of the same length, but I still don't get how we can preserve O(n) complexity when searching for patterns with differing length simultaneously. Can someone please shed some light on this?

编辑(2011年12月):

Edit (December 2011):

维基百科的文章进行了更新,不再声称匹配O(n)的不同长度的多种模式。

The wikipedia article has since been updated and no longer claims to match multiple patterns of differing length in O(n).

推荐答案

我不知道这是否是正确的答案,但无论如何:

在构建的散列值,我们可以为您在该组字符串散列的匹配。阿卡,在电流的哈希值。散列函数/ code通常被实现为一个循环,而循环中,我们可以将我们的快速查找。

当然,我们必须跌倒 M 有从琴弦组的最大字符串长度。

更新:维基百科,

I'm not sure if this is the correct answer, but anyway:

While constructing the hash value, we can check for a match in the set of string hashes. Aka, the current hash value. The hash function/code is usually implemented as a loop and inside that loop we can insert our quick look up.

Of course, we must pick m to have the maximum string length from the set of strings.

Update: From Wikipedia,

[...]
for i from 1 to n-m+1
         if hs ∈ hsubs
             if s[i..i+m-1] = a substring with hash hs
                 return i
         hs := hash(s[i+1..i+m]) // <---- calculating current hash
[...]

我们计算的电流的哈希值 M 步骤。在每一步有一个的临时的散列值,我们可以看看(O(1)复杂性)的散列集。所有的哈希值将具有相同的大小,即32位。

We calculate current hash in m steps. On each step there is a temporary hash value that we can look up ( O(1) complexity ) in the set of hashes. All hashes will have the same size, ie 32 bit.

更新2:的摊销(平均)O(n)的时间复杂度?

上面我说的是 M 必须具有最大字符串长度。事实证明,我们可以利用相反。
随着<一href="http://en.wikipedia.org/wiki/Rabin-Karp%5Fstring%5Fsearch%5Falgorithm#Use%5Fof%5Fhashing%5Ffor%5Fshifting%5Fsubstring%5Fsearch"相对=nofollow>散列,用于移动字符串搜索和固定 M 尺寸就可以达到O(n)的复杂性。

如果我们有不同的长度,我们可以设置 M 到最小字符串长度。此外,在散列集,我们不散列与整个字符串,但与第一m个字符的关联。
现在,虽然搜索的文本,我们检查当前哈希散列集,我们检查了匹配相关联的弦。

这种技术将增加误报但平均有O(n)的时间复杂度。

Update 2: an amortized (average) O(n) time complexity ?

Above I said that m must have the maximum string length. It turns out that we can exploit the opposite.
With hashing for shifting substring search and a fixed m size we can achieve O(n) complexity.

If we have variable length strings we can set m to the minimum string length. Additionally, in the set of hashes we don't associate a hash with the whole string but with the first m-characters of it.
Now, while searching the text we check if the current hash is in the hash set and we examine the associated strings for a match.

This technique will increase the false alarms but on average it has O(n) time complexity.

这篇关于使用拉宾,卡普搜索多个字符串中的模式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆