Z算法的直觉 [英] Intuition behind the Z algorithm

查看:125
本文介绍了Z算法的直觉的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Z算法是具有O(n)复杂度的字符串匹配算法.

The Z algorithm is a string matching algorithm with O(n) complexity.

一个用例是从字符串B中找到字符串A的最长出现时间.例如,从"stackoverflow""overdose"的最长出现时间为"over".您可以通过使用组合字符串"overdose#stackoverflow"(其中#是两个字符串中都不存在的某个字符)调用Z算法来发现这一点.然后Z算法将尝试将组合的字符串与其自身进行匹配-并创建一个数组z [],其中z [i]为您提供从索引i开始的最长匹配长度.在我们的示例中:

One use case is finding the longest occurence of string A from string B. For example, the longest occurence of "overdose" from "stackoverflow" would be "over". You could discover this by calling the Z algorithm with a combined string "overdose#stackoverflow" (where # is some character not present in either string). The Z algorithm would then try to match the combined string with itself - and create an array z[] where z[i] gives you the length of longest match starting from index i. In our example:

index  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21
string o  v  e  r  d  o  s  e  #  s  t  a  c  k  o  v  e  r  f  l  o  w
z    (21) 0  0  0  0  1  0  0  0  0  0  0  0  0  4  0  0  0  0  0  1  0

该算法有很多代码实现和面向数学的解释,下面是一些不错的方法:

There are plenty of code implementations and mathematically oriented explanations of the algorithm, here are some good ones:

http://www.geeksforgeeks.org/z -算法线性时间模式搜索算法/ http://codeforces.com/blog/entry/3107

我可以看到它的工作原理,但我不明白为什么.似乎几乎像黑魔法.我有一个很强烈的直觉,认为该任务应该承担O(n^2),但是这是一种在O(n)

I can see how it works, but I don't understand why. It seems almost like black magic. I have a very strong intuition that this task is supposed to take O(n^2), yet here is an algorithm that does it in O(n)

推荐答案

我也不认为它完全直观,因此我认为我有资格回答.否则,我只是说你不明白,因为你是个白痴,这肯定不是你所希望的答案:-)

I don't find it completely intuitive either, so I think that I qualify for answering. Otherwise I'd just say that you don't understand because you're an idiot, and surely that's not the answer your hoping for :-)

要点(解释中的引用):

Case in point (citation from an explanation):

Correctness is inherent in the algorithm and is pretty intuitively clear.

所以,让我们尝试更加直观...

So, let's try to be even more intuitive...

首先,我猜想O(n ^ 2)的直觉是这样的:对于长度为N的字符串,如果您被丢弃在字符串中的任意位置i而没有其他信息,则匹配x(< N)个字符以计算Z [i].如果您丢了N次,则最多必须进行N(N-1)个测试,所以测试的结果就是O(n ^ 2).

First, I'd guess that the common intuition for O(n^2) is this: for a string of length N, if you're dropped at a random place i in the string with no other information, you have to match x (< N) characters to compute Z[i]. If you're dropped N times, you have to do up to N(N-1) tests, so that's O(n^2).

但是,Z算法很好地利用了您从过去的计算中获得的信息.

The Z algorithm, however, makes good use of the informations you've gained from the past computations.

让我们看看.

首先,只要没有匹配项(Z [i] = 0),就沿着字符串前进,每个字符进行一个比较,因此为O(N). 其次,当找到一个匹配项的范围(在索引i处)时,诀窍是使用先前Z [0 ... i-1]的巧妙演绎来计算该范围中的所有Z值恒定时间,而没有在该范围内进行其他比较.下一场比赛将仅在范围的右侧进行.

First, as long as you don't have a match (Z[i]=0), you progress along the string with one comparison per character, so that's O(N). Second, when you find a range where there's a match (at index i), the trick is to use clever deductions using the previous Z[0...i-1] to compute all the Z values in that range in constant time, without other comparisons inside that range. The next matches will only be done on the right of the range.

无论如何,这就是我的理解,希望对您有所帮助.

That's how I understand it anyway, hope this helps.

这篇关于Z算法的直觉的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆