KMP字符串匹配算法:Auxillary阵列输出 [英] KMP string matching algorithm:Auxillary array output

查看:110
本文介绍了KMP字符串匹配算法:Auxillary阵列输出的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是我的 KMP字符串匹配算法的实施。 当我检查PI阵列,它存储0,1,2,3,4,5,6。但根据算法中的书应该是0,0,1,2,3,0,1。我的code给出正确的结果also.I不明白为什么会这样,还是我做错了什么?如果是的话,请大家指正。

感谢。

 #包括<的iostream>
#包括<字符串>
#包括< string.h中>

使用名字空间std;

为int *计算preFIX(字符P [])
{
    为size_t米= strlen的(P);
    INT * PI =新INT [M]。
    圆周率[0] = 0;
    INT K = 0;

    对于(INT Q = 0; Q&LT,M,Q ++)
    {
        如果(K大于0&安培;&安培; P [K + 1] = P [Q]!)
            K =圆周率[k]的;

        如果(P [K + 1] == P [Q])
            {
                PI [Q] = K;
                K = K + 1;
            }
            PI [Q] = K;
    }

    返回(PI);
}

无效KMP_Matcher(字符T [],焦炭P [])
{

    为size_t N =的strlen(T);
    为size_t米= strlen的(P);

    INT * PI =新INT [M]。
    PI =计算preFIX(P);

    COUT<< ENDL;


    INT Q = 0;
    的for(int i = 0; I< = N;我++)
    {
        如果(Q> 0安培;&安培;!P [Q] = T [I])
        {
            Q = PI [Q  -  1];
        }


        否则,如果(P [Q] == T [I])
        {


            如果(Q == M-1)
            {
                COUT<<转变发生在:<< I-Q<< ENDL;
                Q = PI [Q]
            }
            否则Q = Q + 1;
        }

        否则q ++;
    }
}


诠释的main()
{
    字符T [] =abababacaba;
    字符P [] =ababaca;

    KMP_Matcher(T,P);
    返回0;
}
 

解决方案

您跳转表构造函数根本不检查针prefixes。我们希望能够自己看,在针的每个位置中,最长的可能的适当$ P $长度PFIX 的导致到(但不包括)针那个位置,比其他的全preFIX开始针[0] 刚眼福;这是多远,我们在寻找下一场比赛原路返回。因此,在跳转表的每个条目(例如,表[I] )是完全针的最长可能适当preFIX的长度,这也是一个$ P $在结束串针PFIX [我 - 1]。

在跳转表中的前两个条目是-1和0中,由于一)不匹配,在该图案的开始并不触发回溯(或,换句话说,零长度一个preFIX不能有任何正确的prefixes或后缀)和b)空字符串被认为是一个长度为0。

详情请看维基百科或算法的教科书。

在code来完成上面是:

 为int * build_jump_table(为const char *的目标)
{
    如果(!目标)
        返回NULL;
    INT *表=新的INT [strlen的(目标)+ 1];
    如果(!表)
        返回NULL;
    表[0] = 1; / *未使用的匹配,就在这里使用* /

    的for(int i = 0;目标[我] ='\ 0';!我++){
        表[I + 1] =表[I] + 1;
        而(表[I + 1]大于0&安培;&安培;!靶[I] =目标[表[I + 1]  -  1]){
            表[I + 1] =表[表[I + 1]  -  1] + 1;
        }
    }
    返回表;
}
 

这是相当冗长,可以简化了很多,当你理解了跳转表背后的概念。

This is my Implementation of KMP string matching algorithm. When i check pi array ,it stores 0,1,2,3,4,5,6. But according to algo books it should be 0,0,1,2,3,0,1. My code give correct result also.I don't understand why this is happening, or am I doing something wrong ? and if so ,please correct me.

thanks.

#include<iostream>
#include<string>
#include<string.h>

using namespace std;

int* ComputePrefix(char P[])
{
    size_t m = strlen(P);
    int *pi = new int[m];
    pi[0] = 0;
    int k = 0;

    for(int q =0; q < m; q++)
    {
        if( k > 0 && P[k+1] != P[q])
            k = pi[k];

        if( P[k+1] == P[q])
            {
                pi[q] = k;
                k = k + 1;
            }
            pi[q]=k;
    }

    return (pi);
}

void KMP_Matcher(char T[], char P[])
{

    size_t n = strlen(T);
    size_t m = strlen(P);

    int *pi = new int[m];
    pi = ComputePrefix(P);

    cout<<endl;


    int q =0;
    for (int i = 0; i <= n; i++)
    {
        if( q > 0 && P[q] != T[i] )
        {
            q = pi[q - 1];
        }


        else if( P[q] == T[i])
        {


            if( q == m-1)
            {
                cout<<"Shift occurs at : "<< i-q <<endl;
                q = pi[q];
            }
            else q = q + 1;
        }

        else q++;
    }
}


int main()
{
    char T[] = "abababacaba";
    char P[] = "ababaca";

    KMP_Matcher(T,P);
    return 0;
}

解决方案

Your jump table constructing function simply does not check the needle for prefixes. We want to be able to look up, for each position in the needle, the length of the longest possible proper prefix of the needle leading up to (but not including) that position, other than the full prefix starting at needle[0] that just failed to match; this is how far we have to backtrack in finding the next match. Hence each entry in the jump table (say, table[i]) is exactly the length of the longest possible proper prefix of the needle which is also a prefix of the substring ending at needle[i - 1].

The first two entries in the jump table are -1 and 0, since a) a mismatch at the start of the pattern does not trigger backtracking (or, in other words, a prefix of zero length cannot have any proper prefixes or suffixes) and b) the empty string is considered to be of length 0.

For more details please look at wikipedia or an algorithms textbook.

The code to accomplish the above is:

int *build_jump_table(const char * target)
{
    if(!target)
        return NULL;
    int *table = new int[strlen(target) + 1];
    if(!table)
        return NULL;
    table[0] = -1; /* unused by the matcher, just used here */

    for(int i = 0; target[i] != '\0'; i++) {
        table[i+1] = table[i] + 1;
        while(table[i+1] > 0 && target[i] != target[table[i+1] - 1]) {
            table[i + 1] = table[table[i + 1] - 1] + 1;
        }
    }
    return table;
}

which is quite verbose, and can be simplified a lot when you understand the concept behind the jump table.

这篇关于KMP字符串匹配算法:Auxillary阵列输出的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆