Rabin Karp算法-给定输入的最坏情况O(m * n)如何? [英] Rabin Karp Algorithm - How is the worst case O(m*n) for the given input?

查看:127
本文介绍了Rabin Karp算法-给定输入的最坏情况O(m * n)如何?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在RK算法的顶级编码器代码中:

In the Top Coder's code of RK algorithm:

// correctly calculates a mod b even if a < 0
function int_mod(int a, int b)
{
  return (a % b + b) % b;
}

function Rabin_Karp(text[], pattern[])
{
  // let n be the size of the text, m the size of the
  // pattern, B - the base of the numeral system,
  // and M - a big enough prime number

  if(n < m) return; // no match is possible

  // calculate the hash value of the pattern
  hp = 0;
  for(i = 0; i < m; i++) 
    hp = int_mod(hp * B + pattern[i], M);

  // calculate the hash value of the first segment 
  // of the text of length m
  ht = 0;
  for(i = 0; i < m; i++) 
    ht = int_mod(ht * B + text[i], M);

  if(ht == hp) check character by character if the first
               segment of the text matches the pattern;

  // start the "rolling hash" - for every next character in
  // the text calculate the hash value of the new segment
  // of length m; E = (Bm-1) modulo M            
  for(i = m; i < n; i++) {
    ht = int_mod(ht - int_mod(text[i - m] * E, M), M);
    ht = int_mod(ht * B, M);
    ht = int_mod(ht + text[i], M);

    if(ht == hp) check character by character if the
                 current segment of the text matches
                 the pattern; 
  }
}

据信

不幸的是,在某些情况下,我们仍然必须为文本中的每个起始位置运行天真"方法的整个内部循环–例如,当在字符串"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"中搜索模式"aaa"时—因此,在最坏的情况下,我们仍然需要(n * m)次迭代.

Unfortunately, there are still cases when we will have to run the entire inner loop of the "naive" method for every starting position in the text – for example, when searching for the pattern "aaa" in the string "aaaaaaaaaaaaaaaaaaaaaaaaa" — so in the worst case we will still need (n * m) iterations.

但是算法不会在第一次迭代时停止吗-就像它会看到前三个字母是'a'并与针相匹配吗?

But won't the algorithm stop at the first iteration itself - as when it will see that first three alphabets are 'a' which matches the needle ?

推荐答案

Rabin-Karp算法始终计算大小为Mtext的所有子字符串的哈希值,并将其与pattern.现在,可以有多个具有相同哈希值的子字符串.

Rabin-Karp algorithm keeps computing hash values of all the substring of text of size M and matches it with that of the hash value of the pattern. Now, there can be multiple substrings having a same hash value.

因此,当pattern的哈希值和text的某些子字符串匹配时,我们需要再次对其进行迭代,以确保它们实际上是相同的.

So when the hash values of the pattern and some substring of the text match, we need to iterate over them again just to make sure if they are actually same.

对于pattern = "AAA"text = "AAAAAAAAAAAAA",存在与pattern的哈希值匹配的O(n)子字符串.对于每次比赛,我们都需要迭代以确认在O(m)时间;因此是最坏情况下的复杂度O(n*m).

In case of pattern = "AAA" and text = "AAAAAAAAAAAAA", there are O(n) substrings matching the hash value of the pattern. And for every match, we need to iterate over to confirm in O(m) time; hence the worst-case complexity O(n*m).

这篇关于Rabin Karp算法-给定输入的最坏情况O(m * n)如何?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆