要找到所有的重复子给定的字符串中 [英] To find all the repeating substring in a given string

查看:193
本文介绍了要找到所有的重复子给定的字符串中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我recetly遇到的面试问题: 为了找到一个给定字符串为2的最小尺寸在所有的重复子串。 该算法应该是高效的。

I recetly come across an interview question : To find all the repeating substring in a given string with a minimal size of 2. The algorithm should be efficient one.

code为以上问题给出如下,但它是没有效率的。

Code for above question is given below but it isn't efficient one.

#include <iostream>
#include <algorithm>
#include <iterator>
#include <set>
#include <string>

using namespace std;

int main()
{
    typedef string::const_iterator iterator;
    string s("ABCFABHYIFAB");
    set<string> found;

    if (2 < s.size())
        for (iterator i = s.begin() + 1, j = s.end(); i != j; ++i)
            for (iterator x = s.begin(); x != i; ++x)
            {
                iterator tmp = mismatch(i, j, x).second;;
                if (tmp - x > 1)
                    found.insert(string(x, tmp));
            }

            copy(found.begin(), found.end(),ostream_iterator<string>(cout, "\n"));
}

我的问题是,是否有可以实现的时间上面的问题的任何数据结构 复杂性为O(N)?

My question is that, is there any data structure which can implement above question in time complexity of O(N)?

如果你的答案是后缀树或散列请详细说明了。

If your answer is Suffix tree or Hashing please elaborate it.

推荐答案

如果你分析输出字符串AAAAAAAAAAAAAA,则有O(N²)字符它,因此算法至少O(N²)。

If you analyze the output for the string "AAAAAAAAAAAAAA", then there are O(n²) characters in it, so the algorithm is at least O(n²).

要实现O(N²),只是建立在后缀树获得第每个后缀(指数为[1..N],[第2..N],[3..n],...,[n..n])。这不要紧,如果其中一个字符串没有自己的终端节点,算了算多久每个节点使用。

To achieve O(n²), just build the suffix tree for each suffix of s (indices [1..n], [2..n], [3..n], ..., [n..n]). It doesn't matter if one of the strings has no own end node, just count how often each node is used.

最后,遍历每个节点计数> 1,并打印其路径。

At the end, iterate over each node with count>1 and print its path.

这篇关于要找到所有的重复子给定的字符串中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆