最长公共子串 [英] Longest Common Substring

查看:208
本文介绍了最长公共子串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们 A B 分别有两个字符串。 的长度大于或等于 B 。我们必须找出最长公共子串。如果有多个答案,那么我们就必须输出,来得早在 B (先前在其开始索引至上)的子字符串。

We have two strings a and b respectively. The length of a is greater than or equal to b. We have to find out the longest common substring. If there are multiple answers then we have to output the substring which comes earlier in b (earlier as in whose starting index comes first).

注:长度 A B 可以高达10 6

Note: The length of a and b can be up to 106.

我试图找到使用后缀阵列(排序使用快速排序的后缀)的最长公共子串。因为当有多个答案的情况下,我想推动所有常见的子字符串在栈等于最长公共子串的长度。

I tried to find the longest common substring using suffix array (sorting the suffixes using quicksort). For the case when there is more than one answer, I tried pushing all the common substrings in a stack which are equal to the length of the longest common substring.

我想知道有没有更快的方法来做到这一点?

I wanted to know is there any faster way to do so?

推荐答案

建立一个后缀树字符串 A $ B ,即 A 连有一些字符,比如 $ 两个字符串不存在,然后用串联b 。 A(COM pressed)后缀树可以建在O(| A | + | B |)时间和内存,并有O(| A | + | B |)。节点

Build a suffix tree of a string a$b, that is, a concatenated with some character like $ not occurring in both strings, then concatenated with b. A (compressed) suffix tree can be built in O(|a|+|b|) time and memory, and have O(|a|+|b|) nodes.

现在,对于每个节点,我们知道它的深度(由从根开始并且遍历树向下到该节点所获得的字符串的长度)。我们也可以保持两个布尔量的轨迹:在对应于 A ,以及它是否在对应于构建阶段走访<$ C构建阶段这个节点是否被访问$ C> B (例如,我们不妨分别建立了两棵树,然后使用pre序遍历它们合并)。现在,任务归结为寻找这是在这两个阶段,其可以通过一个单一的$ P $对 - 顺序遍历进行访问的最深顶点。可多选的情况下,应该很容易处理。

Now, for each node, we know its depth (the length of the string obtained by starting from the root and traversing the tree down to that node). We also can keep track of two boolean quantities: whether this node was visited during the build phase corresponding to a, and whether it was visited during the build phase corresponding to b (for example, we might as well build the two trees separately and then merge them using pre-order traversal). Now, the task boils down to finding the deepest vertex which was visited during both phases, which can be done by a single pre-order traversal. The case of multiple answers should be easy to handle.

维基百科页面包含该技术的另一个(简介)概述

This Wikipedia page contains another (brief) overview of the technique.

这篇关于最长公共子串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆