找到最长重复序列在字符串 [英] Find longest repetitive sequence in a string

查看:185
本文介绍了找到最长重复序列在字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要找到在一个串以警告该序列必须重复三次或三次以上的最长序列。因此,例如,如果我的字符串是:

I need to find the longest sequence in a string with the caveat that the sequence must be repeated three or more times. So, for example, if my string is:

fdwaw4helloworldvcdv1c3xcv3xcz1sda21f2sd1ahelloworldgafgfa4564534321fadghelloworld

那么我想值的HelloWorld 被返回。

我知道实现这个的一些方法,但我现在面临的问题是,实际的字符串是荒谬的大,所以我真正需要的是能够及时做到这一点的方法。

I know of a few ways of accomplishing this but the problem I'm facing is that the actual string is absurdly large so I'm really looking for a method that can do it in a timely fashion.

推荐答案

这个问题是href="http://en.wikipedia.org/wiki/Longest_repeated_substring_problem">最长的重复子问题<的并有一个为O(n) - 时间算法求解它使用后缀树的。这个想法(所建议的维基百科)是构造一个后缀树(时间为O(n)),标注在树中的所有节点,后代(使用DFS时间为O(n))的数量,然后找最深的节点在树中(使用DFS时间为O(n))至少有三个后代。这整个算法的时间为O(n)。

This problem is a variant of the longest repeated substring problem and there is an O(n)-time algorithm for solving it that uses suffix trees. The idea (as suggested by Wikipedia) is to construct a suffix tree (time O(n)), annotate all the nodes in the tree with the number of descendants (time O(n) using a DFS), and then to find the deepest node in the tree with at least three descendants (time O(n) using a DFS). This overall algorithm takes time O(n).

也就是说,后缀树是出了名的难以构建的,所以你可能会想找到一个实现后缀树为您尝试此之前执行一个Python库。快速谷歌搜索变成了这个库的,虽然我不知道这是否是一个很好的实现。

That said, suffix trees are notoriously hard to construct, so you would probably want to find a Python library that implements suffix trees for you before attempting this implementation. A quick Google search turns up this library, though I'm not sure whether this is a good implementation.

希望这有助于!

这篇关于找到最长重复序列在字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆