算法:找出两个字符串之间的所有常见的子串顺序为preserved [英] Algorithm: Find all common substrings between two strings where order is preserved

查看:131
本文介绍了算法:找出两个字符串之间的所有常见的子串顺序为preserved的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

想讨论的算法,没有code。

问题:令S和T是元素的两个序列。发现它们之间的公共子序列,其中的元素的顺序是preserved 的。

Problem: Let S and T be two sequences of elements. Find the common subsequences between them where the order of the elements is preserved.

它应具有O(N + M)的运行时间,其中n是S的长度,m是T的长度我还要使假设大部分两个序列将是相似的。

It should have O(n + m) running time where n is the length of S, and m is the length of T. I would also like to make the assumption that for the most part the two sequences will be similar.

的最佳解决方案:一些研究,这似乎是最佳的是先建立一个通用后缀树的两个序列的一个解决方案之后。然后找到最长公共子并认为这个序列是解决方案的一部分。然后,无论是从树中删除此序列或与该序列从原来的两个序列去除,以形成S'和T'建立一个新的后缀树。然后求S'和T'之间的最长公共子,等等。

An optimal solution?: After some research, one solution that appears to be optimal is to first build a generalised suffix tree for the two sequences. Then find the longest common substring and consider this subsequence to be part of the solution. Then either remove this subsequence from the tree or build a new suffix tree with this subsequence removed from the two original sequences to form S' and T'. Then find the longest common substring between S' and T', and so on.

要分析的运行时间,建设树需要O(n)的,你可以找到在澳长度和S和T的最长公共子串起始位置(N + M)。

To analyze the running time, building the tree takes O(n) and you can find the lengths and starting positions of the longest common substrings of S and T in O(n + m).

还有没有其他的(多)切实可行的解决办法,有人知道的,也可以链接到?任何发表论文考虑相同或相关的问题你都知道?输入和有关上述溶液建设性的批评?感谢您的时间!

Are there other (more) practical solutions that someone knows of or can link to? Any published papers considering the same or related problem you all know about? Input and constructive criticism about the above solution? Thanks for all your time!

推荐答案

我首先想到的是使用一个后缀树,并与濒海战斗舰的问题。但我不知道有什么更好的解决办法是把我的头顶。我做了快速搜索,并遇到了一些论文和项目,可能是有用的,但不能保证。

My first thought was the use of a suffix tree, and relating it to the LCS problem. But I am not sure what a better solution would be off the top of my head. I did a quick search and came across a few papers and projects that might be useful, but no guarantees.

http://dl.acm.org/citation.cfm?id=1625377 (直接链接在这里我相信: HTTP://www.aaai .ORG /说明书/ IJCAI / 2007 / IJCAI07-101.pdf

http://dl.acm.org/citation.cfm?id=1625377 (direct link here I believe: http://www.aaai.org/Papers/IJCAI/2007/IJCAI07-101.pdf)

HTTP://$c$c.google.com/p /全共子序列/

对不起,它一直是漫长的一天,我不是很够清醒,试图更好的解决方案我自己。

Sorry, it has been a long day and I am not quite awake enough to attempt a better solution myself.

这篇关于算法:找出两个字符串之间的所有常见的子串顺序为preserved的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆