身后移过whem错配发生在KMP算法的文字推理? [英] Reasoning behind shifting over the text whem mismatch occurs in KMP algorithm?

查看:129
本文介绍了身后移过whem错配发生在KMP算法的文字推理?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在试图理解KMP算法。我仍然没有落后KMP算法推理的清醒的认识。假设我的文字是 bacbababaabcbab 和模式是 abababca 。通过使用最长真$ P $的子(模式)匹配的子(图案)适当的后缀PFIX长度的规则,我充满了我的表[]

I have been trying to understand KMP algorithm. Still I didn't get the clear understanding of reasoning behind kmp algorithm. Suppose my text is bacbababaabcbab and pattern is abababca. By using the rule of length of longest proper prefix of sub(pattern) that matches the proper suffix of sub(pattern), I filled my table[].

A B A B A B C A
  0 0 1 2 3 4 0 1

a b a b a b c a
0 0 1 2 3 4 0 1

现在我开始申请KMP算法与我的模式和表中的文本。

Now I started applying KMP algorithm on the text with my pattern and table.

即将上面的文字指数4后,我们有长度(L)= 5匹配; 通过查看表[1- 1] = 3; 按KMP算法,我们可以跳过长度可达2个字符,并可以继续

After coming to index 4 of above text, we have a match of length(l)=5; by looking at table[l-1]=3; As per KMP algorithm we can skip length up to 2 chars and can continue .

bacbababaabcbab
     ---- XX |||
       abababca

bacbababaabcbab
----xx|||
abababca

在这里我没有收到背后的逻辑发生变化。为什么要转变?可有人请澄清我的困惑?

Here I am not getting the logic behind shifting. Why should we shift? Can somebody please clarify my confusion?

推荐答案

要了解KMP算法背后的逻辑,你应该先了解,如何KMP算法中比蛮力算法更好。

To understand the logic behind the KMP algorithm , you should first understand , how this KMP algo is better than brute-force algorithm .

Idea

在格局的转变,天真的算法已经忘记了previously匹配的符号的所有信息。因此,它是可能的,它连连重新比较具有不同图案符号的文本符号。这导致Θ(NM)(N:文本,米长度:模式长度)的最坏情况的复杂性。

After a shift of the pattern, the naive algorithm has forgotten all information about previously matched symbols. So it is possible that it re-compares a text symbol with different pattern symbols again and again. This leads to its worst case complexity of Θ(nm) (n: length of the text, m: length of the pattern).

高德纳,莫里斯和普拉特[KMP 77]的算法利用由previous符号的比较中获得的信息。它永不再比较已经匹配的模式符号的文本符号。其结果是,在该克努特 - 莫里斯 - 普拉特算法的搜索阶段的复杂度为O(N)。

The algorithm of Knuth, Morris and Pratt [KMP 77] makes use of the information gained by previous symbol comparisons. It never re-compares a text symbol that has matched a pattern symbol. As a result, the complexity of the searching phase of the Knuth-Morris-Pratt algorithm is in O(n).

然而,图案的preprocessing是必要的,以分析其结构。在preprocessing阶段具有的O(M)的复杂性。由于M< = n时,克努特 - 莫里斯 - 普拉特算法的整体复杂度为O(N)

However, a preprocessing of the pattern is necessary in order to analyze its structure. The preprocessing phase has a complexity of O(m). Since m<=n, the overall complexity of the Knuth-Morris-Pratt algorithm is in O(n).

正文:bacbababaabcbab 图案:abababca

text :bacbababaabcbab pattern:abababca

在穷举法, 一个滑动模式上的文字之一,并检查匹配。如果发现匹配,然后通过1再次滑动,检查以后的比赛。

In brute-force method , Slide the pattern over text one by one and check for a match. If a match is found, then slides by 1 again to check for subsequent matches .

void search(char *pat, char *txt)
{
    int M = strlen(pat);
    int N = strlen(txt);

    /* A loop to slide pat[] one by one */
    for (int i = 0; i <= N - M; i++)
    {
        int j;

        /* For current index i, check for pattern match */
        for (j = 0; j < M; j++)
        {
            if (txt[i+j] != pat[j])
                break;
        }
        if (j == M)  // if pat[0...M-1] = txt[i, i+1, ...i+M-1]
        {
           printf("Pattern found at index %d \n", i);
        }
    }
}

上面的算法复杂度为O(NM)。 在上面的算法中,我们从来没有使用过,我们处理的比较数据,

The complexity of above algorithm is O(nm). In the above algorithm we never used comparison data we processed,

Bacbababaabcbab   //let I be iterating variable over this text

Abababca    //j be iterating variable over pattern

当我= 0和j = 0有错配(文[I + J]!=拍拍[J]),我们增加我,直到有一个匹配。 当i = 4,存在匹配(文字[I + J] ==轻拍[j]的),增量Ĵ直到我们找到失配(如果j = patternlength我们发现图案),在给定的例子中,我们发现在不匹配J = 5当i = 4,错配发生在IDEX 4 + 5 = 9中的文字。匹配的子字符串是贝巴, **

When i=0 and j=0 there is a mismatch (text[i+j]!=pat[j]), we increment i until there is a match . When i =4 , there is a match(text[i+j]==pat[j]), increment j till we find mismatch (if j= patternlength we found pattern) ,in the given example we find mismatch at j=5 when i=4 , a mismatch happens at idex 4+5=9 in text. The sub string that matched is ababa , **

  • 为什么我们需要选择合适的时间最长的preFIX这也是正确的后缀:

** 从上面的:我们看到的模式与子亚的斯亚贝巴结束的错配发生在9。 现在,如果我们想充分利用了我们目前所做那么我们就可以跳过比较(增量)我超过1,则比较的数量将减少从而获得更好的时间复杂度。
现在有什么优势,我们可以采取处理比较数据贝巴。 如果我们仔细看:preFIX ABA 串亚的斯亚贝巴与文本比较和匹配,同样是后缀的情况下的 ABA 。但有一个不匹配'B'与'A'

** From the above : we see that mismatch happened at 9 where pattern ended with substring ababa . Now if we want to take advantage over the comparisons we have done so far then we can skip (increment) i more than 1 then the numbers of comparisons will be reduced leading to better time complexity.
Now what advantage we can take on processed comparison data "ababa" . If we see carefully: the prefix aba of string ababa is compared with text and matched, same is the case with suffix aba. But there is a mismatch ‘b’ with ‘a’

Bacbababaabcbab
         ||||||            
         ||||| x
        |||| ||
        ababab

但根据幼稚的做法,我们增加i到5。但我们知道,看着它,我们可以设置I = ABA的6作为下一次出现发生在每一个元素的文本比较,在6,所以不是我们preprocess图案为寻找最长适当preFIX这也是正确的后缀(这被称为边界)。在上面的例子中为巴,和最长边界长度为3(即<强>阿巴)。所以递增:子长 - 边境线最长的=> 5-3 = 2
长度。 如果我们比较失败,在ABA,然后长边境线最长的为1和j = 3×2所以递增。

But according to naïve approach, we increment i to 5. But we know by looking at it, we can set i =6 as next occurrence of aba occurs at 6. So instead of comparing with each and every element in text we preprocess the pattern for finding the longest proper prefix which is also proper suffix (which is called border). In the above example for ‘ababa’ ,and length of longest border is 3 (which is aba) . So increment by: length of substring – length of longest border => 5-3 =2.
If our comparison fails at aba , then length of longest border is 1 and j=3 so increment by 2 .

更多关于如何preprocess: HTTP ://www-igm.univ-mlv.fr/~lecroq/string/node8.html#SECTION0080 http://www.inf.fh-flensburg.de/lang /algorithmen/pattern/kmpen.htm

For more on how to preprocess : http://www-igm.univ-mlv.fr/~lecroq/string/node8.html#SECTION0080 http://www.inf.fh-flensburg.de/lang/algorithmen/pattern/kmpen.htm

这篇关于身后移过whem错配发生在KMP算法的文字推理?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆