当目标是查找某个字符串的所有出现时,KMP的最坏情况的复杂性是什么? [英] What's the worst case complexity for KMP when the goal is to find all occurrences of a certain string?

查看:379
本文介绍了当目标是查找某个字符串的所有出现时,KMP的最坏情况的复杂性是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我也想知道哪种算法在查找字符串中所有出现的字符串中都具有最差的复杂度。似乎博耶–摩尔算法具有线性时间复杂度。

I would also like to know which algorithm has the worst case complexity of all for finding all occurrences of a string in another. Seems like Boyer–Moore's algorithm has a linear time complexity.

推荐答案

KMP算法具有线性复杂度,可用于查找模式的所有出现像Boyer-Moore算法¹这样的字符串。如果您尝试在诸如 aaaaaaaaa之类的字符串中找到诸如 aaaaaa之类的模式,则在您拥有第一个完全匹配项之后,

The KMP algorithm has linear complexity for finding all occurrences of a pattern in a string, like the Boyer-Moore algorithm¹. If you try to find a pattern like "aaaaaa" in a string like "aaaaaaaaa", once you have the first complete match,

aaaaaaaaa
aaaaaa
 aaaaaa
      ^

边界表包含模式前缀的下一个最长匹配项(对应于模式的最宽边框)的信息仅短一个字符(就此而言,完全匹配等效于模式末尾的不匹配)。这样,该模式就被进一步移动了一个位置,并且由于从边界表中知道该模式的所有字符(可能是最后一个匹配项除外),因此下一个比较是在最后一个模式字符和对齐的文本字符之间进行的。在这种特殊情况下(在a n 中发现a m 的情况),这是天真的匹配算法最坏的情况,KMP算法将每个文本字符精确地比较一次。

the border table contains the information that the next longest possible match (corresponding to the widest border of the pattern) of a prefix of the pattern is just one character short (a complete match is equivalent to a mismatch one past the end of the pattern in this respect). Thus the pattern is moved one place further, and since from the border table it is known that all characters of the pattern except possibly the last match, the next comparison is between the last pattern character and the aligned text character. In this particular case (find occurrences of am in an), which is the worst case for the naive matching algorithm, the KMP algorithm compares each text character exactly once.

在每一步中,至少


  • 文本位置比较的字符

  • 模式的第一个字符相对于文本的位置

增加,却从未减少。比较的文字字符的位置最多可增加 length(text)-1 倍,第一个图案字符的位置最多可增加 length (text)-length(pattern)次,因此该算法最多需要 2 * length(text)-length(pattern)-1

increases, and neither ever decreases. The position of the text character compared can increase at most length(text)-1 times, the position of the first pattern character can increase at most length(text) - length(pattern) times, so the algorithm takes at most 2*length(text) - length(pattern) - 1 steps.

预处理(边界表的构造)最多需要 2 * length(pattern)步,因此总体复杂度为O(m + n),如果 m 是,则不再执行 m + 2 * n 步骤模式的长度和 n 文本的长度。

The preprocessing (construction of the border table) takes at most 2*length(pattern) steps, thus the overall complexity is O(m+n) and no more m + 2*n steps are executed if m is the length of the pattern and n the length of the text.

¹注意,Boyer-Moore算法为如果需要所有匹配,则通常呈现的周期性模式和诸如a m 和a n 之类的文本的最坏情况复杂度为O(m * n),因为匹配,

¹ Note that the Boyer-Moore algorithm as commonly presented has a worst-case complexity of O(m*n) for periodic patterns and texts like am and an if all matches are required, because after a complete match,

aaaaaaaaa
aaaaaa
 aaaaaa
      ^
  <- <-
 ^

整个模式将被重新比较。为避免这种情况,您需要记住在完全匹配后的移位后模式的前缀仍匹配多长时间,并且仅比较新字符。

the entire pattern would be re-compared. To avoid that, you need to remember how long a prefix of the pattern still matches after the shift following a complete match and only compare the new characters.

这篇关于当目标是查找某个字符串的所有出现时,KMP的最坏情况的复杂性是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆