Z算法的直觉 [英] Intuition behind the Z algorithm
问题描述
Z算法是具有O(n)复杂度的字符串匹配算法.
The Z algorithm is a string matching algorithm with O(n) complexity.
一个用例是从字符串B中找到字符串A的最长出现时间.例如,从"stackoverflow"
中"overdose"
的最长出现时间为"over"
.您可以通过使用组合字符串"overdose#stackoverflow"
(其中#是两个字符串中都不存在的某个字符)调用Z算法来发现这一点.然后Z算法将尝试将组合的字符串与其自身进行匹配-并创建一个数组z [],其中z [i]为您提供从索引i开始的最长匹配长度.在我们的示例中:
One use case is finding the longest occurence of string A from string B. For example, the longest occurence of "overdose"
from "stackoverflow"
would be "over"
. You could discover this by calling the Z algorithm with a combined string "overdose#stackoverflow"
(where # is some character not present in either string). The Z algorithm would then try to match the combined string with itself - and create an array z[] where z[i] gives you the length of longest match starting from index i. In our example:
index 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
string o v e r d o s e # s t a c k o v e r f l o w
z (21) 0 0 0 0 1 0 0 0 0 0 0 0 0 4 0 0 0 0 0 1 0
该算法有很多代码实现和面向数学的解释,下面是一些不错的方法:
There are plenty of code implementations and mathematically oriented explanations of the algorithm, here are some good ones:
http://www.geeksforgeeks.org/z -算法线性时间模式搜索算法/ http://codeforces.com/blog/entry/3107
我可以看到它的工作原理,但我不明白为什么.似乎几乎像黑魔法.我有一个很强烈的直觉,认为该任务应该承担O(n^2)
,但是这是一种在O(n)
I can see how it works, but I don't understand why. It seems almost like black magic. I have a very strong intuition that this task is supposed to take O(n^2)
, yet here is an algorithm that does it in O(n)
推荐答案
我也不认为它完全直观,因此我认为我有资格回答.否则,我只是说你不明白,因为你是个白痴,这肯定不是你所希望的答案:-)
I don't find it completely intuitive either, so I think that I qualify for answering. Otherwise I'd just say that you don't understand because you're an idiot, and surely that's not the answer your hoping for :-)
要点(解释中的引用):
Case in point (citation from an explanation):
Correctness is inherent in the algorithm and is pretty intuitively clear.
所以,让我们尝试更加直观...
So, let's try to be even more intuitive...
首先,我猜想O(n ^ 2)的直觉是这样的:对于长度为N的字符串,如果您被丢弃在字符串中的任意位置i而没有其他信息,则匹配x(< N)个字符以计算Z [i].如果您丢了N次,则最多必须进行N(N-1)个测试,所以测试的结果就是O(n ^ 2).
First, I'd guess that the common intuition for O(n^2) is this: for a string of length N, if you're dropped at a random place i in the string with no other information, you have to match x (< N) characters to compute Z[i]. If you're dropped N times, you have to do up to N(N-1) tests, so that's O(n^2).
但是,Z算法很好地利用了您从过去的计算中获得的信息.
The Z algorithm, however, makes good use of the informations you've gained from the past computations.
让我们看看.
首先,只要没有匹配项(Z [i] = 0),就沿着字符串前进,每个字符进行一个比较,因此为O(N). 其次,当找到一个匹配项的范围(在索引i处)时,诀窍是使用先前Z [0 ... i-1]的巧妙演绎来计算该范围中的所有Z值恒定时间,而没有在该范围内进行其他比较.下一场比赛将仅在范围的右侧进行.
First, as long as you don't have a match (Z[i]=0), you progress along the string with one comparison per character, so that's O(N). Second, when you find a range where there's a match (at index i), the trick is to use clever deductions using the previous Z[0...i-1] to compute all the Z values in that range in constant time, without other comparisons inside that range. The next matches will only be done on the right of the range.
无论如何,这就是我的理解,希望对您有所帮助.
That's how I understand it anyway, hope this helps.
这篇关于Z算法的直觉的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!