打印最长公共序列 [英] Print the longest common sequence
问题描述
我想实现的经典问题的解决办法,但我很困惑,我得到一些情况下,我没有找到很好的效果。
输出:最长公共序列,长度
高清longestCommonSeq(X,Y):
LCS = [0用于z的范围(LEN(Y)+1)用于z范围内(LEN(X)+1)]
因为我在范围内(LEN(X)):
LCS [I] [0] = 0
对于j的范围(LEN(y))为:
LCS [0] [J] = 0
因为我在范围内(LEN(X)):
对于j的范围(LEN(y))为:
如果x [我] == Y [J]:
LCS [I] [j]的= 1 + LCS [I-1] [J-1]
其他:
LCS [I] [J] = MAX(LCS [I-1] [J],LCS [I] [J-1])
I,J = len个(x)的,LEN(y)的
打印LCS
RES =
虽然我大于0且j大于0:
如果x [I-1] == Y [J-1]:
解析度= X [I-1] +水库
I- = 1
J- = 1
其他:
如果LCS [I-1] [j]的> LCS [I] [J-1]:
I- = 1
其他:
J- = 1
返回水库,LCS [LEN(X)-1] [LEN(Y)-1]
对于第一种情况:
X =AGGTAB
Y =GXTXAYB
打印longestCommonSeq(X,Y)
结果: (GTAB',4)
这是对的!
但情况如下:
A =APBCADCQER
B =RASBTAUCVE
打印longestCommonSeq(A,B)
('',5)
我错过了什么?,谁能给一个提示?
您正在建表错误的;你应该离开一行 I = 0
和 J = 0
柱在0完全,并把比赛进入较高的行。这意味着你需要加1 我
和Ĵ
坐标在第一循环中,像Python开始指数0
这样一来,你最终以0在最后一台;使用类似于维基百科的解释的本质就是建立这个符号:
<$p$p><$c$c>┌─┬─────┬─────────┬─────────┬─────┬──────┬───────┬─┐ ││A│G│G│T│A│B││ ├─┼─────┼─────────┼─────────┼─────┼──────┼───────┼─┤ │G│&LT; ^0│\ G变│\ G变│&LT;摹│&LT;摹│&LT;摹│0│ ├─┼─────┼─────────┼─────────┼─────┼──────┼───────┼─┤ │X│&LT; ^ ^0││摹&LT; ^摹│&LT; ^G│&LT; ^摹│&LT; ^摹│0│ ├─┼─────┼─────────┼─────────┼─────┼──────┼───────┼─┤ │T│&LT; ^ ^0││摹&LT; ^摹│\ GT│&LT; GT│&LT; GT│0│ ├─┼─────┼─────────┼─────────┼─────┼──────┼───────┼─┤ │X│&LT; ^ ^0││摹&LT; ^摹│^ GT│&LT; ^GT│&LT; ^ GT│0│ ├─┼─────┼─────────┼─────────┼─────┼──────┼───────┼─┤ │A│\ A│&LT; ^ A | G│&LT; ^ A | G│^ GT│\横行霸道│&LT; GTA│0│ ├─┼─────┼─────────┼─────────┼─────┼──────┼───────┼─┤ │I│^ A│&LT; ^ A | G│&LT; ^ A | G│^ GT│^ GTA│&LT; ^GTA│0│ ├─┼─────┼─────────┼─────────┼─────┼──────┼───────┼─┤ │B│^ A│&LT; ^ A | G│&LT; ^ A | G│^ GT│^ GTA│\ GTAB│0│ ├─┼─────┼─────────┼─────────┼─────┼──────┼───────┼─┤ ││0│0│0│0│0│0│0│ └─┴─────┴─────────┴─────────┴─────┴──────┴───────┴─┘ &LT;和^:最大挑 \:匹配字符添加到细胞的左上角。这是留给你的最后一列和行设置为0而不是第一个。这是你的运气了Python除$ P $点 1
作为最后一个元素,因为 I = 0
或 J = 0
这正是你最终做的事情。你想要什么,而不是是:
接下来,你并不需要咨询 X
和是
在第二循环中,你咨询的只是的你的 LCS
表。因为你结束了 0
,总是最后一排和列中,你的算法的版本放平当 X 最后一个字符code>和
是
不匹配,你只落得递减我
和 Ĵ
,你不能打LCS路径列于表几乎是随机。
请注意,你的前两个为
环路完全是多余的,可以删除;你的整个 LCS
表仅包含 0
值,为什么它们设置为 0
更多一些?伪code在维基百科使用从1循环的长度(含),但由于Python从0开始,你加1的长度,保持长度包容,你不需要在这里做这个。
您校正功能则是这样的:
高清longestCommonSeq(X,Y):
LCS = [[0用于z在范围(LEN(γ)+ 1)]用于z在范围(LEN(X)+1)]
因为我在范围内(LEN(X)):
对于j的范围(LEN(y))为:
如果x [我] == Y [J]:
LCS [I + 1] [J + 1] = 1 + LCS [I] [j]的
其他:
LCS [I + 1] [J + 1] = MAX(LCS [I] [J + 1],LCS [I + 1] [J]。)
RES =''
I,J = len个(x)的,LEN(y)的
而i和j:
如果LCS [I] [J] == LCS [I-1] [J]:
我 - = 1
ELIF LCS [I] [J] == LCS [I] [J-1]:
的J - = 1
其他:
解析度= X [I-1] +水库
我 - = 1
的J - = 1
返回水库,LCS [-1] [ - 1]
由于该 LCS
表现正确建立,我们可以使用 LCS [-1] [ - 1]
作为最大长度值了。
这将产生预期的结果:
&GT;&GT;&GT; X =AGGTAB
&GT;&GT;&GT; Y =GXTXAYB
&GT;&GT;&GT;打印longestCommonSeq(X,Y)
('GTAB',4)
&GT;&GT;&GT; A =APBCADCQER
&GT;&GT;&GT; B =RASBTAUCVE
&GT;&GT;&GT;打印longestCommonSeq(A,B)
('ABACE',5)
另一种方法是在你的第二个阶段解决掉接一个,看看实际符合您输入单词的LCS表格单元格;例如减去1 我
和Ĵ
:
高清longestCommonSeq(X,Y):
LCS = [0用于z的范围(LEN(Y)+1)用于z范围内(LEN(X)+1)]
因为我在范围内(LEN(X)):
对于j的范围(LEN(y))为:
如果x [我] == Y [J]:
LCS [I] [J] = 1 + LCS [我 - 1] [J - 1]
其他:
LCS [I] [J] = MAX(LCS [我 - 1] [J],LCS [I] [J - 1])
I,J = len个(X) - 1,LEN(Y) - 1
打印LCS
RES =
而I&GT; = 0和J&GT; = 0:
如果x [我] == Y [J]:
RES = X [I] +资源
我 - = 1
的J - = 1
其他:
如果LCS [我 - 1] [J] GT; LCS [I] [J - 1]:
我 - = 1
其他:
的J - = 1
返回水库,LCS [LEN(X) - 1] [LEN(Y) - 1]
现在我
和Ĵ
为 LEN(X) - 1
到 0
和 LEN(Y) - 1
到 0
分别,并且再次引用正确的细胞。
I am trying to implement solution of the classical problem, but I am confused as I get some cases where I don't find good results.
Output: "The Longest common sequence", "Length"
def longestCommonSeq(x,y):
LCS = [[0 for z in range(len(y)+1)] for z in range(len(x)+1)]
for i in range(len(x)):
LCS[i][0]=0
for j in range(len(y)):
LCS[0][j]=0
for i in range(len(x)):
for j in range(len(y)):
if x[i]== y[j]:
LCS[i][j]= 1+LCS[i-1][j-1]
else:
LCS[i][j]= max(LCS[i-1][j],LCS[i][j-1])
i, j= len(x),len(y)
print LCS
res = ""
while i>0 and j>0:
if x[i-1]== y[j-1]:
res= x[i-1] + res
i-=1
j-=1
else:
if LCS[i-1][j]>LCS[i][j-1]:
i-=1
else:
j-=1
return res, LCS[len(x)-1][len(y)-1]
For the first case:
x= "AGGTAB"
y= "GXTXAYB"
print longestCommonSeq(x,y)
Result: ('GTAB', 4)
This is right!
but for case like:
a= "APBCADCQER"
b= "RASBTAUCVE"
print longestCommonSeq(a,b)
('', 5)
What did I miss ?, Can anyone give a hint ?
You are building the table wrong; you should leave the row at i = 0
and the column at j = 0
at 0 entirely, and put matches into the higher rows. This means you need to add 1 to the i
and j
coordinates in the first loop, as Python starts indices at 0.
As a result, you end up with a table with the 0s at the end; using notation similar to the explanation on Wikipedia you essentially build this:
┌─┬─────┬─────────┬─────────┬─────┬──────┬───────┬─┐
│ │A │G │G │T │A │B │ │
├─┼─────┼─────────┼─────────┼─────┼──────┼───────┼─┤
│G│< ^ 0│\ G │\ G │< G │< G │< G │0│
├─┼─────┼─────────┼─────────┼─────┼──────┼───────┼─┤
│X│< ^ 0│^ G │< ^ G │< ^ G│< ^ G │< ^ G │0│
├─┼─────┼─────────┼─────────┼─────┼──────┼───────┼─┤
│T│< ^ 0│^ G │< ^ G │\ GT │< GT │< GT │0│
├─┼─────┼─────────┼─────────┼─────┼──────┼───────┼─┤
│X│< ^ 0│^ G │< ^ G │^ GT │< ^ GT│< ^ GT │0│
├─┼─────┼─────────┼─────────┼─────┼──────┼───────┼─┤
│A│\ A │< ^ A | G│< ^ A | G│^ GT │\ GTA │< GTA │0│
├─┼─────┼─────────┼─────────┼─────┼──────┼───────┼─┤
│I│^ A │< ^ A | G│< ^ A | G│^ GT │^ GTA │< ^ GTA│0│
├─┼─────┼─────────┼─────────┼─────┼──────┼───────┼─┤
│B│^ A │< ^ A | G│< ^ A | G│^ GT │^ GTA │\ GTAB │0│
├─┼─────┼─────────┼─────────┼─────┼──────┼───────┼─┤
│ │0 │0 │0 │0 │0 │0 │0│
└─┴─────┴─────────┴─────────┴─────┴──────┴───────┴─┘
< and ^: maximum picked
\: add matching character to cell to the top left.
This is leaving your last column and row set to 0 rather than the first. It is your luck that Python interprets -1
as the last element, because at i = 0
or j = 0
that is exactly what you end up doing. What you want instead is:
┌─┬─┬─────┬─────────┬─────────┬─────┬──────┬───────┐
│ │ │A │G │G │T │A │B │
├─┼─┼─────┼─────────┼─────────┼─────┼──────┼───────┤
│ │0│0 │0 │0 │0 │0 │0 │
├─┼─┼─────┼─────────┼─────────┼─────┼──────┼───────┤
│G│0│< ^ 0│\ G │\ G │< G │< G │< G │
├─┼─┼─────┼─────────┼─────────┼─────┼──────┼───────┤
│X│0│< ^ 0│^ G │< ^ G │< ^ G│< ^ G │< ^ G │
├─┼─┼─────┼─────────┼─────────┼─────┼──────┼───────┤
│T│0│< ^ 0│^ G │< ^ G │\ GT │< GT │< GT │
├─┼─┼─────┼─────────┼─────────┼─────┼──────┼───────┤
│X│0│< ^ 0│^ G │< ^ G │^ GT │< ^ GT│< ^ GT │
├─┼─┼─────┼─────────┼─────────┼─────┼──────┼───────┤
│A│0│\ A │< ^ A | G│< ^ A | G│^ GT │\ GTA │< GTA │
├─┼─┼─────┼─────────┼─────────┼─────┼──────┼───────┤
│I│0│^ A │< ^ A | G│< ^ A | G│^ GT │^ GTA │< ^ GTA│
├─┼─┼─────┼─────────┼─────────┼─────┼──────┼───────┤
│B│0│^ A │< ^ A | G│< ^ A | G│^ GT │^ GTA │\ GTAB │
└─┴─┴─────┴─────────┴─────────┴─────┴──────┴───────┘
Next, you don't need to consult x
and y
in the second loop, you consult just your LCS
table. Because you ended up with 0
, always, in the last row and column, your version of the algorithm falls flat when the last character of x
and y
do not match and you just end up decrementing i
and j
almost randomly as you fail to hit the LCS path set out in the table.
Note that your first two for
loops are entirely redundant and can be removed; your whole LCS
table contains only 0
values, why set them to 0
some more? The pseudocode in Wikipedia uses a loop from 1 to the length (inclusive), but since Python starts at 0 and you add 1 to the length to keep the length inclusive, you don't need to do this here.
Your corrected function then looks like:
def longestCommonSeq(x,y):
LCS = [[0 for z in range(len(y) + 1)] for z in range(len(x) + 1)]
for i in range(len(x)):
for j in range(len(y)):
if x[i] == y[j]:
LCS[i + 1][j + 1] = 1 + LCS[i][j]
else:
LCS[i + 1][j + 1] = max(LCS[i][j + 1], LCS[i + 1][j])
res = ''
i, j = len(x), len(y)
while i and j:
if LCS[i][j] == LCS[i-1][j]:
i -= 1
elif LCS[i][j] == LCS[i][j-1]:
j -= 1
else:
res = x[i-1] + res
i -= 1
j -= 1
return res, LCS[-1][-1]
Because the LCS
table is now correctly built we can use LCS[-1][-1]
as the maximum length value too.
This produces the expected results:
>>> x = "AGGTAB"
>>> y = "GXTXAYB"
>>> print longestCommonSeq(x, y)
('GTAB', 4)
>>> a = "APBCADCQER"
>>> b = "RASBTAUCVE"
>>> print longestCommonSeq(a, b)
('ABACE', 5)
The alternative would be to fix the off-by-one in your second phase and look at the LCS table cells that actually correspond to your input words; e.g. subtracting 1 from i
and j
:
def longestCommonSeq(x,y):
LCS = [[0 for z in range(len(y)+1)] for z in range(len(x)+1)]
for i in range(len(x)):
for j in range(len(y)):
if x[i] == y[j]:
LCS[i][j] = 1 + LCS[i - 1][j - 1]
else:
LCS[i][j] = max(LCS[i - 1][j], LCS[i][j - 1])
i, j = len(x) - 1, len(y) - 1
print LCS
res = ""
while i >= 0 and j >= 0:
if x[i] == y[j]:
res= x[i] + res
i -= 1
j -= 1
else:
if LCS[i - 1][j] > LCS[i][j - 1]:
i -= 1
else:
j -= 1
return res, LCS[len(x) - 1][len(y) - 1]
Now i
and j
range from len(x) - 1
to 0
and len(y) - 1
to 0
respectively, and you are referencing the correct cells again.
这篇关于打印最长公共序列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!