3个字符串中最长的公共子序列 [英] Longest Common Subsequence among 3 Strings

查看:118
本文介绍了3个字符串中最长的公共子序列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经实施了动态编程解决方案,以在2个字符串中找到最长的公共子序列。显然,有一种方法可以概括该算法以在3个字符串中找到LCS,但是在我的研究中,我没有找到有关如何进行此操作的任何信息。任何帮助将不胜感激。

I've implemented the dynamic programming solution to find the longest common subsequence among 2 strings. There is apparently a way to generalize this algorithm to find the LCS among 3 strings, but in my research I have not found any information on how to go about this. Any help would be appreciated.

推荐答案

要查找2个字符串A和B的最长公共子序列(LCS),可以遍历对角的二维数组,如您发布的链接中所示。数组中的每个元素都与查找子字符串A’和B’的LCS问题相对应(A的行号,B的列号)。通过计算数组中所有元素的值可以解决此问题。您必须确定在计算数组元素的值时,已经解决了计算给定值所需的所有子问题。这就是为什么您对角遍历二维数组的原因。

To find the Longest Common Subsequence (LCS) of 2 strings A and B, you can traverse a 2-dimensional array diagonally like shown in the Link you posted. Every element in the array corresponds to the problem of finding the LCS of the substrings A' and B' (A cut by its row number, B cut by its column number). This problem can be solved by calculating the value of all elements in the array. You must be certain that when you calculate the value of an array element, all sub-problems required to calculate that given value has already been solved. That is why you traverse the 2-dimensional array diagonally.

此解决方案可以扩展为找到N个字符串之间最长的公共子序列,但这需要一种通用的迭代方法N维数组,以便仅当元素需要解决的所有子问题都已解决时,才能到达任何元素。

This solution can be scaled to finding the longest common subsequence between N strings, but this requires a general way to iterate an array of N dimensions such that any element is reached only when all sub-problems the element requires a solution to has been solved.

而不是迭代N维数组您也可以按特殊顺序递归解决问题。对于递归,保存中间解决方案很重要,因为许多分支机构都需要相同的中间解决方案。我在C#中编写了一个小示例来执行此操作:

Instead of iterating the N-dimensional array in a special order, you can also solve the problem recursively. With recursion it is important to save the intermediate solutions, since many branches will require the same intermediate solutions. I have written a small example in C# that does this:

string lcs(string[] strings)
{
    if (strings.Length == 0)
        return "";
    if (strings.Length == 1)
        return strings[0];
    int max = -1;
    int cacheSize = 1;
    for (int i = 0; i < strings.Length; i++)
    {
        cacheSize *= strings[i].Length;
        if (strings[i].Length > max)
            max = strings[i].Length;
    }
    string[] cache = new string[cacheSize];
    int[] indexes = new int[strings.Length];
    for (int i = 0; i < indexes.Length; i++)
        indexes[i] = strings[i].Length - 1;
    return lcsBack(strings, indexes, cache);
}
string lcsBack(string[] strings, int[] indexes, string[] cache)
{
    for (int i = 0; i < indexes.Length; i++ )
        if (indexes[i] == -1)
            return "";
    bool match = true;
    for (int i = 1; i < indexes.Length; i++)
    {
        if (strings[0][indexes[0]] != strings[i][indexes[i]])
        {
            match = false;
            break;
        }
    }
    if (match)
    {
        int[] newIndexes = new int[indexes.Length];
        for (int i = 0; i < indexes.Length; i++)
            newIndexes[i] = indexes[i] - 1;
        string result = lcsBack(strings, newIndexes, cache) + strings[0][indexes[0]];
        cache[calcCachePos(indexes, strings)] = result;
        return result;
    }
    else
    {
        string[] subStrings = new string[strings.Length];
        for (int i = 0; i < strings.Length; i++)
        {
            if (indexes[i] <= 0)
                subStrings[i] = "";
            else
            {
                int[] newIndexes = new int[indexes.Length];
                for (int j = 0; j < indexes.Length; j++)
                    newIndexes[j] = indexes[j];
                newIndexes[i]--;
                int cachePos = calcCachePos(newIndexes, strings);
                if (cache[cachePos] == null)
                    subStrings[i] = lcsBack(strings, newIndexes, cache);
                else
                    subStrings[i] = cache[cachePos];
            }
        }
        string longestString = "";
        int longestLength = 0;
        for (int i = 0; i < subStrings.Length; i++)
        {
            if (subStrings[i].Length > longestLength)
            {
                longestString = subStrings[i];
                longestLength = longestString.Length;
            }
        }
        cache[calcCachePos(indexes, strings)] = longestString;
        return longestString;
    }
}
int calcCachePos(int[] indexes, string[] strings)
{
    int factor = 1;
    int pos = 0;
    for (int i = 0; i < indexes.Length; i++)
    {
        pos += indexes[i] * factor;
        factor *= strings[i].Length;
    }
    return pos;
}

我的代码示例可以进一步优化。许多要缓存的字符串是重复的,而有些是重复的,只添加了一个附加字符。当输入字符串变大时,这将使用超出必要的空间。

My code example can be optimized further. Many of the strings being cached are duplicates, and some are duplicates with just one additional character added. This uses more space than necessary when the input strings become large.

在输入时: 666222054263314443712, 5432127413542377777, 6664664565464057425

On input: "666222054263314443712", "5432127413542377777", "6664664565464057425"

返回的LCS是 54442

The LCS returned is "54442"

这篇关于3个字符串中最长的公共子序列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆