打印最长公共序列 [英] Print the longest common sequence

查看:106
本文介绍了打印最长公共序列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想实现的经典问题的解决办法,但我很困惑,我得到一些情况下,我没有找到很好的效果。

输出:最长公共序列,长度

 高清longestCommonSeq(X,Y):
    LCS = [0用于z的范围(LEN(Y)+1)用于z范围内(LEN(X)+1)]
    因为我在范围内(LEN(X)):
        LCS [I] [0] = 0

    对于j的范围(LEN(y))为:
        LCS [0] [J] = 0

    因为我在范围内(LEN(X)):
        对于j的范围(LEN(y))为:
            如果x [我] == Y [J]:
                LCS [I] [j]的= 1 + LCS [I-1] [J-1]
            其他:
                LCS [I] [J] = MAX(LCS [I-1] [J],LCS [I] [J-1])


    I,J = len个(x)的,LEN(y)的
    打印LCS

    RES =
    虽然我大于0且j大于0:

        如果x [I-1] == Y [J-1]:
            解析度= X [I-1] +水库
            I- = 1
            J- = 1
        其他:
            如果LCS [I-1] [j]的> LCS [I] [J-1]:
                I- = 1
            其他:
                J- = 1

    返回水库,LCS [LEN(X)-1] [LEN(Y)-1]
 

对于第一种情况:

  X =AGGTAB
Y =GXTXAYB
打印longestCommonSeq(X,Y)
 

结果: (GTAB',4) 这是对的!

但情况如下:

  A =APBCADCQER
B =RASBTAUCVE
打印longestCommonSeq(A,B)

 ('',5)
 

我错过了什么?,谁能给一个提示?

解决方案

您正在建表错误的;你应该离开一行 I = 0 J = 0 柱在0完全,并把比赛进入较高的行。这意味着你需要加1 Ĵ坐标在第一循环中,像Python开始指数0

这样一来,你最终以0在最后一台;使用类似于维基百科的解释的本质就是建立这个符​​号:

<$p$p><$c$c>┌─┬─────┬─────────┬─────────┬─────┬──────┬───────┬─┐ ││A│G│G│T│A│B││ ├─┼─────┼─────────┼─────────┼─────┼──────┼───────┼─┤ │G│&LT; ^0│\ G变│\ G变│&LT;摹│&LT;摹│&LT;摹│0│ ├─┼─────┼─────────┼─────────┼─────┼──────┼───────┼─┤ │X│&LT; ^ ^0││摹&LT; ^摹│&LT; ^G│&LT; ^摹│&LT; ^摹│0│ ├─┼─────┼─────────┼─────────┼─────┼──────┼───────┼─┤ │T│&LT; ^ ^0││摹&LT; ^摹│\ GT│&LT; GT│&LT; GT│0│ ├─┼─────┼─────────┼─────────┼─────┼──────┼───────┼─┤ │X│&LT; ^ ^0││摹&LT; ^摹│^ GT│&LT; ^GT│&LT; ^ GT│0│ ├─┼─────┼─────────┼─────────┼─────┼──────┼───────┼─┤ │A│\ A│&LT; ^ A | G│&LT; ^ A | G│^ GT│\横行霸道│&LT; GTA│0│ ├─┼─────┼─────────┼─────────┼─────┼──────┼───────┼─┤ │I│^ A│&LT; ^ A | G│&LT; ^ A | G│^ GT│^ GTA│&LT; ^GTA│0│ ├─┼─────┼─────────┼─────────┼─────┼──────┼───────┼─┤ │B│^ A│&LT; ^ A | G│&LT; ^ A | G│^ GT│^ GTA│\ GTAB│0│ ├─┼─────┼─────────┼─────────┼─────┼──────┼───────┼─┤ ││0│0│0│0│0│0│0│ └─┴─────┴─────────┴─────────┴─────┴──────┴───────┴─┘ &LT;和^:最大挑 \:匹配字符添加到细胞的左上角。

这是留给你的最后一列和行设置为0而不是第一个。这是你的运气了Python除$ P $点 1 作为最后一个元素,因为 I = 0 J = 0 这正是你最终做的事情。你想要什么,而不是是:

<$p$p><$c$c>┌─┬─┬─────┬─────────┬─────────┬─────┬──────┬───────┐ │││A│G│G│T│A│B│ ├─┼─┼─────┼─────────┼─────────┼─────┼──────┼───────┤ ││0│0│0│0│0│0│0│ ├─┼─┼─────┼─────────┼─────────┼─────┼──────┼───────┤ │G│0│&LT; ^0│\ G变│\ G变│&LT;摹│&LT;摹│&LT;摹│ ├─┼─┼─────┼─────────┼─────────┼─────┼──────┼───────┤ │X│0│&LT; ^ ^0││摹&LT; ^摹│&LT; ^G│&LT; ^摹│&LT; ^摹│ ├─┼─┼─────┼─────────┼─────────┼─────┼──────┼───────┤ │T│0│&LT; ^ ^0││摹&LT; ^摹│\ GT│&LT; GT│&LT; GT│ ├─┼─┼─────┼─────────┼─────────┼─────┼──────┼───────┤ │X│0│&LT; ^ ^0││摹&LT; ^摹│^ GT│&LT; ^GT│&LT; ^ GT│ ├─┼─┼─────┼─────────┼─────────┼─────┼──────┼───────┤ │A│0│\ A│&LT; ^ A | G│&LT; ^ A | G│^ GT│\横行霸道│&LT; GTA│ ├─┼─┼─────┼─────────┼─────────┼─────┼──────┼───────┤ │I│0│^ A│&LT; ^ A | G│&LT; ^ A | G│^ GT│^ GTA│&LT; ^GTA│ ├─┼─┼─────┼─────────┼─────────┼─────┼──────┼───────┤ │B│0│^ A│&LT; ^ A | G│&LT; ^ A | G│^ GT│^ GTA│\ GTAB│ └─┴─┴─────┴─────────┴─────────┴─────┴──────┴───────┘

接下来,你并不需要咨询 X 在第二循环中,你咨询的只是的你的 LCS 表。因为你结束了 0 ,总是最后一排和列中,你的算法的版本放平当 X 不匹配,你只落得递减 Ĵ,你不能打LCS路径列于表几乎是随机。

请注意,你的前两个环路完全是多余的,可以删除;你的整个 LCS 表仅包含 0 值,为什么它们设置为 0 更多一些?伪code在维基百科使用从1循环的长度(含),但由于Python从0开始,你加1的长度,保持长度包容,你不需要在这里做这个。

您校正功能则是这样的:

 高清longestCommonSeq(X,Y):
    LCS = [[0用于z在范围(LEN(γ)+ 1)]用于z在范围(LEN(X)+1)]
    因为我在范围内(LEN(X)):
        对于j的范围(LEN(y))为:
            如果x [我] == Y [J]:
                LCS [I + 1] [J + 1] = 1 + LCS [I] [j]的
            其他:
                LCS [I + 1] [J + 1] = MAX(LCS [I] [J + 1],LCS [I + 1] [J]。)

    RES =''
    I,J = len个(x)的,LEN(y)的

    而i和j:
        如果LCS [I] [J] == LCS [I-1] [J]:
            我 -  = 1
        ELIF LCS [I] [J] == LCS [I] [J-1]:
            的J  -  = 1
        其他:
            解析度= X [I-1] +水库
            我 -  = 1
            的J  -  = 1

    返回水库,LCS [-1] [ -  1]
 

由于该 LCS 表现正确建立,我们可以使用 LCS [-1] [ - 1] 作为最大长度值了。

这将产生预期的结果:

 &GT;&GT;&GT; X =AGGTAB
&GT;&GT;&GT; Y =GXTXAYB
&GT;&GT;&GT;打印longestCommonSeq(X,Y)
('GTAB',4)
&GT;&GT;&GT; A =APBCADCQER
&GT;&GT;&GT; B =RASBTAUCVE
&GT;&GT;&GT;打印longestCommonSeq(A,B)
('ABACE',5)
 

另一种方法是在你的第二个阶段解决掉接一个,看看实际符合您输入单词的LCS表格单元格;例如减去1 Ĵ

 高清longestCommonSeq(X,Y):
    LCS = [0用于z的范围(LEN(Y)+1)用于z范围内(LEN(X)+1)]

    因为我在范围内(LEN(X)):
        对于j的范围(LEN(y))为:
            如果x [我] == Y [J]:
                LCS [I] [J] = 1 + LCS [我 -  1] [J  -  1]
            其他:
                LCS [I] [J] = MAX(LCS [我 -  1] [J],LCS [I] [J  -  1])

    I,J = len个(X) -  1,LEN(Y) -  1
    打印LCS

    RES =
    而I&GT; = 0和J&GT; = 0:
        如果x [我] == Y [J]:
            RES = X [I] +资源
            我 -  = 1
            的J  -  = 1
        其他:
            如果LCS [我 -  1] [J] GT; LCS [I] [J  -  1]:
                我 -  = 1
            其他:
                的J  -  = 1

    返回水库,LCS [LEN(X) -  1] [LEN(Y) -  1]
 

现在Ĵ LEN(X) - 1 0 LEN(Y) - 1 0 分别,并且再次引用正确的细胞。

I am trying to implement solution of the classical problem, but I am confused as I get some cases where I don't find good results.

Output: "The Longest common sequence", "Length"

def longestCommonSeq(x,y):
    LCS = [[0 for z in range(len(y)+1)] for z in  range(len(x)+1)] 
    for i in range(len(x)):
        LCS[i][0]=0

    for j in range(len(y)):
        LCS[0][j]=0

    for i in range(len(x)):
        for j in range(len(y)):
            if x[i]== y[j]:
                LCS[i][j]= 1+LCS[i-1][j-1]
            else: 
                LCS[i][j]= max(LCS[i-1][j],LCS[i][j-1])


    i, j= len(x),len(y)
    print LCS

    res = "" 
    while i>0 and j>0:

        if x[i-1]== y[j-1]:
            res= x[i-1] + res
            i-=1
            j-=1 
        else:
            if LCS[i-1][j]>LCS[i][j-1]:
                i-=1
            else: 
                j-=1

    return res, LCS[len(x)-1][len(y)-1]

For the first case:

x= "AGGTAB"
y= "GXTXAYB"
print longestCommonSeq(x,y)

Result: ('GTAB', 4) This is right!

but for case like:

a= "APBCADCQER" 
b= "RASBTAUCVE"
print longestCommonSeq(a,b)

 ('', 5)

What did I miss ?, Can anyone give a hint ?

解决方案

You are building the table wrong; you should leave the row at i = 0 and the column at j = 0 at 0 entirely, and put matches into the higher rows. This means you need to add 1 to the i and j coordinates in the first loop, as Python starts indices at 0.

As a result, you end up with a table with the 0s at the end; using notation similar to the explanation on Wikipedia you essentially build this:

┌─┬─────┬─────────┬─────────┬─────┬──────┬───────┬─┐
│ │A    │G        │G        │T    │A     │B      │ │
├─┼─────┼─────────┼─────────┼─────┼──────┼───────┼─┤
│G│< ^ 0│\ G      │\ G      │< G  │< G   │< G    │0│
├─┼─────┼─────────┼─────────┼─────┼──────┼───────┼─┤
│X│< ^ 0│^ G      │< ^ G    │< ^ G│< ^ G │< ^ G  │0│
├─┼─────┼─────────┼─────────┼─────┼──────┼───────┼─┤
│T│< ^ 0│^ G      │< ^ G    │\ GT │< GT  │< GT   │0│
├─┼─────┼─────────┼─────────┼─────┼──────┼───────┼─┤
│X│< ^ 0│^ G      │< ^ G    │^ GT │< ^ GT│< ^ GT │0│
├─┼─────┼─────────┼─────────┼─────┼──────┼───────┼─┤
│A│\ A  │< ^ A | G│< ^ A | G│^ GT │\ GTA │< GTA  │0│
├─┼─────┼─────────┼─────────┼─────┼──────┼───────┼─┤
│I│^ A  │< ^ A | G│< ^ A | G│^ GT │^ GTA │< ^ GTA│0│
├─┼─────┼─────────┼─────────┼─────┼──────┼───────┼─┤
│B│^ A  │< ^ A | G│< ^ A | G│^ GT │^ GTA │\ GTAB │0│
├─┼─────┼─────────┼─────────┼─────┼──────┼───────┼─┤
│ │0    │0        │0        │0    │0     │0      │0│
└─┴─────┴─────────┴─────────┴─────┴──────┴───────┴─┘

< and ^: maximum picked
\: add matching character to cell to the top left.

This is leaving your last column and row set to 0 rather than the first. It is your luck that Python interprets -1 as the last element, because at i = 0 or j = 0 that is exactly what you end up doing. What you want instead is:

┌─┬─┬─────┬─────────┬─────────┬─────┬──────┬───────┐
│ │ │A    │G        │G        │T    │A     │B      │
├─┼─┼─────┼─────────┼─────────┼─────┼──────┼───────┤
│ │0│0    │0        │0        │0    │0     │0      │
├─┼─┼─────┼─────────┼─────────┼─────┼──────┼───────┤
│G│0│< ^ 0│\ G      │\ G      │< G  │< G   │< G    │
├─┼─┼─────┼─────────┼─────────┼─────┼──────┼───────┤
│X│0│< ^ 0│^ G      │< ^ G    │< ^ G│< ^ G │< ^ G  │
├─┼─┼─────┼─────────┼─────────┼─────┼──────┼───────┤
│T│0│< ^ 0│^ G      │< ^ G    │\ GT │< GT  │< GT   │
├─┼─┼─────┼─────────┼─────────┼─────┼──────┼───────┤
│X│0│< ^ 0│^ G      │< ^ G    │^ GT │< ^ GT│< ^ GT │
├─┼─┼─────┼─────────┼─────────┼─────┼──────┼───────┤
│A│0│\ A  │< ^ A | G│< ^ A | G│^ GT │\ GTA │< GTA  │
├─┼─┼─────┼─────────┼─────────┼─────┼──────┼───────┤
│I│0│^ A  │< ^ A | G│< ^ A | G│^ GT │^ GTA │< ^ GTA│
├─┼─┼─────┼─────────┼─────────┼─────┼──────┼───────┤
│B│0│^ A  │< ^ A | G│< ^ A | G│^ GT │^ GTA │\ GTAB │
└─┴─┴─────┴─────────┴─────────┴─────┴──────┴───────┘

Next, you don't need to consult x and y in the second loop, you consult just your LCS table. Because you ended up with 0, always, in the last row and column, your version of the algorithm falls flat when the last character of x and y do not match and you just end up decrementing i and j almost randomly as you fail to hit the LCS path set out in the table.

Note that your first two for loops are entirely redundant and can be removed; your whole LCS table contains only 0 values, why set them to 0 some more? The pseudocode in Wikipedia uses a loop from 1 to the length (inclusive), but since Python starts at 0 and you add 1 to the length to keep the length inclusive, you don't need to do this here.

Your corrected function then looks like:

def longestCommonSeq(x,y):
    LCS = [[0 for z in range(len(y) + 1)] for z in range(len(x) + 1)] 
    for i in range(len(x)):
        for j in range(len(y)):
            if x[i] ==  y[j]:
                LCS[i + 1][j + 1] = 1 + LCS[i][j]
            else: 
                LCS[i + 1][j + 1] = max(LCS[i][j + 1], LCS[i + 1][j])

    res = ''
    i, j = len(x), len(y)

    while i and j:
        if LCS[i][j] == LCS[i-1][j]:
            i -= 1
        elif LCS[i][j] == LCS[i][j-1]: 
            j -= 1
        else:
            res = x[i-1] + res
            i -= 1
            j -= 1

    return res, LCS[-1][-1]

Because the LCS table is now correctly built we can use LCS[-1][-1] as the maximum length value too.

This produces the expected results:

>>> x = "AGGTAB"
>>> y = "GXTXAYB"
>>> print longestCommonSeq(x, y)
('GTAB', 4)
>>> a = "APBCADCQER" 
>>> b = "RASBTAUCVE"
>>> print longestCommonSeq(a, b)
('ABACE', 5)

The alternative would be to fix the off-by-one in your second phase and look at the LCS table cells that actually correspond to your input words; e.g. subtracting 1 from i and j:

def longestCommonSeq(x,y):
    LCS = [[0 for z in range(len(y)+1)] for z in  range(len(x)+1)] 

    for i in range(len(x)):
        for j in range(len(y)):
            if x[i] ==  y[j]:
                LCS[i][j] = 1 + LCS[i - 1][j - 1]
            else: 
                LCS[i][j] = max(LCS[i - 1][j], LCS[i][j - 1])

    i, j = len(x) - 1, len(y) - 1
    print LCS

    res = "" 
    while i >= 0 and j >= 0:
        if x[i] == y[j]:
            res= x[i] + res
            i -= 1
            j -= 1 
        else:
            if LCS[i - 1][j] > LCS[i][j - 1]:
                i -= 1
            else: 
                j -= 1

    return res, LCS[len(x) - 1][len(y) - 1]

Now i and j range from len(x) - 1 to 0 and len(y) - 1 to 0 respectively, and you are referencing the correct cells again.

这篇关于打印最长公共序列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆