动态编程:最长的公共子序列 [英] Dynamic Programming: Longest Common Subsequence

查看:101
本文介绍了动态编程:最长的公共子序列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我要遍历的笔记是在寻找两个等长字符串的最长公共子序列的上下文中讨论动态编程的。有问题的算法输出长度(而不是子字符串)。

I'm going over notes that discuss dynamic programming in the context of finding the longest common subsequence of two equal-length strings. The algorithm in question outputs the length (not the substring).

所以我有两个字符串,例如:

So I have two strings, say:

S = ABAZDC,T = BACBAD

S = ABAZDC, T = BACBAD

最长的公共子序列是ABAD(子串不必是相邻的字母)

Longest common subsequence is ABAD (substrings don't have to be adjacent letters)

算法如下,其中LCS [i,j]表示S [1..i]和T [1..j]的最长公共子序列:

Algorithm is as follows, where LCS[i, j] denotes longest common subsequence of S[1..i] and T[1..j]:

if S[i] != T[j] then
    LCS[i, j] = max(LCS[i - 1, j], LCS[i, j - 1])
else
    LCS[i, j] = 1 + LCS[i - 1, j - 1]

我的笔记声称您可以填写表格,其中每个字符串都沿轴写入。像这样的东西:

My notes claim you can fill out a table where each string is written along an axis. Something like:

BACBAD

A 0 1 1 1 1 1

A 0 1 1 1 1 1

B 1 1 1 2 2 2

B 1 1 1 2 2 2

A ...

Z

D

C

两个问题:

1)我们实际上是如何开始填写此表的。算法是递归的,但似乎并不能提供基本的情况(否则,我将仅调用LCS [5,5])? Notes声称您可以对i和j进行两个简单的循环,并在恒定时间内填写每个点...

1) How do we actually start filling this table out. Algorithm is recursive but doesn't seem to provide a base case (otherwise I'd just call LCS[5, 5])? Notes claim you can do two simple loops with i and j and fill out each spot in constant time...

2)我们如何修改算法,以使最长的通用子序列将是相邻的字母?我的想法是,一旦我发现S中的下一个字母与T中的下一个字母不匹配,就必须将当前子序列的长度重置为0。但这很棘手,因为我想跟踪看到的最长到目前为止(我看到的第一个子序列可能是最长的)。因此,也许会有一个额外的参数longestThusFar,当我们最初调用算法并在后续调用中进行更改时,该参数为0。

2) How could we modify the algorithm so the longest common subsequence would be of adjacent letters? My thought is that I'd have to reset the length of the current subsequence to 0 once I find that the next letter in S doesn't match the next letter in T. But it's tricky because I want to keep track of the longest seen thus far (It's possible the first subsequence I see is the longest one). So maybe I'd have an extra argument, longestThusFar, that is 0 when we call our algorithm initially and changes in subsequent calls.

有人可以使这个更为严格吗?

Can someone make this a bit more rigorous?

谢谢!

推荐答案

首先,算法是递归的,但是实现总是迭代的。换句话说,我们没有明确地调用

Firstly,the algorithm is recursive,but the implementation is always iterative.In other words,we do not explicitly,call the same function from the function itself(Recursion).

我们使用已经填充的表项来补偿递归。

We use the table entries already filled to compensate for the recursion.

说,您有两个长度为 M 的字符串。

Say,you have two strings of length M.

然后定义了尺寸为(M + 1)的表X(M + 1)

for(i = 0 to M)
{
  LCS[0][i]=0;
}
for(i = 1 to M)
{
  LCS[i][0]=0;
}

您会得到类似

    B,A,C,B,A,D
  0,0,0,0,0,0,0
A 0
B 0
A 0
Z 0
D 0
C 0

0th col 中每个零表示如果不考虑字符串 BACBAD 的字符,则LCS的长度= 0。

Each zero in 0th col means that if no character of string BACBAD is considered,then length of LCS = 0.

第0行中的每个零表示如果不考虑字符串 ABAZDC 的字符,则LCS的长度= 0。

Each zero in 0th row means that if no character of string ABAZDC is considered,then length of LCS = 0.

其余条目使用您提到的规则填充。

The rest of entries are filled using the rules as you mentioned.

for(i = 1 to M)
{
 for(j = 1 to M)
 {
  if S[i-1] != T[j-1] then
    LCS[i, j] = max(LCS[i - 1, j], LCS[i, j - 1])
  else
    LCS[i, j] = 1 + LCS[i - 1, j - 1]
 }
}

请注意其 S [i-1]!= T [j-1] 而不是 S [i]!= T [j] ,因为当您填写 LCS [i, j] ,您一直在比较 S的第i-1个字符和T的第j-1个字符。

Notice that its S[i-1] != T[j-1] and not S[i] != T[j] because when you fill LCS[i,j], you are always comparing i-1 th char of S and j-1 th char of T.

LCS的长度由 LCS [M,M]

最好的理解方法是手动尝试。

The best way to understand this is try it by hand.

回答第二个问题,您不需要修改算法。

解决方案位于表格中

为了检索LCS,我们制作了一个额外的表格 T ,其尺寸为 MXM 。并且我们对算法进行如下修改。

In order to retrieve the LCS, we make an extra table T of characters of dimensions MXM. and we modify the algo as follows.

for(i = 1 to M)
{
 for(j = 1 to M)
 {
  if S[i-1] != T[j-1] then
  {
     LCS[i, j] = max(LCS[i - 1, j], LCS[i, j - 1])
     if(LCS[i - 1, j]>=LCS[i, j - 1])          
      T[i-1][j-1]='u'//meaning up
     else T[i-1][j-1]='l'//meaning left
  }
  else
  {
    LCS[i, j] = 1 + LCS[i - 1, j - 1]
    T[i-1][j-1]='d'//meaning diagonally up
  }
 }
}

现在,为了知道两个子元素共有的最长子字符串(邻字母),对角线遍历T。

Now,in order to know longest substring(OF ADJACENT LETTERS) common to both,traverse T diagonally.

长度=连续的最大数目

The length = largest number of consecutive d's found in a diagonal.

任何方阵 NXN 的对角线遍历都是通过。

The diagonal traversal of any square matrix NXN is done by.

包含主要对角线的下三角形

j=N-1
while(j>=0)
{
 i=j;k=0;
 while(i <= N-1)
 {
  entry T[i][k];
  ++i;++k
 }
 --j;
}

上三角形

j=1;
while(j<=N-1)
{
 i=j;k=0;
 while(i<=N-1)
 {
  entry T[k][i];
  ++k;++i;
 }
 --j;
}

这篇关于动态编程:最长的公共子序列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆