最长重复子串更好的复杂性 [英] Longest Repeated Substring Better Complexity

查看:81
本文介绍了最长重复子串更好的复杂性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我通过对后缀列表进行排序后比较字符串的后缀来实现一种解决方案。有没有比这段代码性能更好的线性时间算法?

I have implemented a solution by comparing the suffixes of a string after sorting the suffix list. Is there any linear time algorithm that performs better than this piece of code?

#include <iostream>
#include <cstring>
#include <algorithm>
using namespace std;
void preCompute(string input[],string s)
{
    int n = s.length();
    for(int i=0; i<n; i++)
        input[i] = s.substr(i,n);
}
string LongestCommonSubString(string first,string second)
{
    int n = min(first.length(),second.length());
    for(int i=0; i<n; i++)
        if(first[i]!=second[i])
            return first.substr(0,i);
    return first.substr(0,n);
}
string lrs(string s)
{
    int n = s.length();
    string input[n];
    preCompute(input,s);
    sort(input, input+n);
    string lrs = "";
    for(int i=0; i<n-1; i++)
    {
        string x = LongestCommonSubString(input[i],input[i+1]);
        if(x.length()>lrs.length())
        {
            lrs = x;
        }
    }
    return lrs;
}
int main()
{
    string input[2] = {"banana","missisipi"};
    for(int i=0;i<2;i++)
        cout<<lrs(input[i])<<endl;
    return 0;
}

我为这个问题找到了很好的资源。请参见此处

I found a really good resource for this question. See here

推荐答案

您可以在线性时间内构建后缀树(请参见)。最长的重复子字符串对应于最深的内部节点(当我说最深时,是指从根开始的路径具有最大字符数,而不是最大边缘数)。原因很简单。内部节点对应于出现在多个后缀中的后缀(即子字符串)的前缀。

You can build a suffix tree in linear time (see this). The longest repeated substring corresponds to the deepest internal node (when I say deepest I mean the path from the root has the maximum number of characters, not the maximum number of edges). The reason for that is simple. Internal nodes correspond to prefixes of suffixes (ie substrings) that occur in multiple suffixes.

实际上,这相当复杂。因此,您采用的方法足够好。我可以建议进行一些修改:

In reality, this is fairly complex. So the approach you are taking is good enough. I have a few modifications that I can suggest:


  1. 不要创建子字符串,子字符串可以用一对数字表示。当您需要实际的字符时,请查找原始字符串。实际上,后缀对应于一个索引(开始索引)。

  1. Do not create substrings, substrings can be denoted by a pair of numbers. When you need the actual characters, look up the original string. In fact suffixes, correspond to a single index (the start index).

每对连续后缀对的最长公共前缀可以在构造您的线性时间的后缀数组(但是O(n log n)算法要容易得多)。请查阅的参考。

The longest common prefix of every pair of consecutive suffixes, can be computed while constructing your suffix array in linear time (but O(n log n) algorithms are much easier). Consult the references of this.

如果如果真的坚持要在线性时间内运行整个程序,则可以在线性时间内构造后缀数组。我敢肯定,如果您进行一些搜索,就可以轻松找到指针。

If you really insist on running the whole thing in linear time, then you can construct the suffix array in linear time. I am sure if you search around a bit, you can easily find pointers.

有非常优雅(但不是线性)的实现,描述了此处

There are very elegant (but not linear) implementations described here.

这篇关于最长重复子串更好的复杂性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆