找出是否存在两个相邻的相同子字符串 [英] Finding out whether there exist two identical substrings one next to another

查看：339 发布时间：2020/7/25 21:15:45 string algorithm hash substring

本文介绍了找出是否存在两个相邻的相同子字符串的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我们有一个字符串.

ABAEABABEABE

现在，我们必须检查是否存在一个子字符串，其后跟着另一个与第一个子字符串完全相同的子字符串.

Now we've got to check whether there exist a substring that is followed next by another substring that's exactly the same as the first one.

在此示例中: ABAEAB ABE ABE
ABE后跟ABE，这是两个相同的子字符串.

In this example: ABAEABABEABE
ABE is followed by ABE and that are two identical substrings.

在此示例中:

AAB

只是A，因为A后面跟着另一个A.

在此示例中:
ABCDEFGHIJKLMNO
没有这样的子字符串，因此答案为否.

In this example:

AAB

It would be simply A, beacuse A is followed by another A.

In this example:
ABCDEFGHIJKLMNO
There doesn't exist such a substring, so the answer would be NO.

我只设法找到了可以在O(n ^ 2)中运行的算法.那就是哈希值及其前缀.然后，对于每个字母，我们简单地展开并检查所有以该字母结尾的单词.有n个字母.我们需要将其扩展n次.所以是O(n ^ 2).我相信应该有一个O(n log n)算法来解决这个问题.

I only managed to find an algorithm that would run in O(n^2). That is getting Hashes and its prefixes. Then for each letter we expand simply and check all the words ending on that letter. There are n letters. We need to expand it n times. So it's O(n^2). I believe there should be a O(n log n) algorithm for this problem.

有人有更好的主意吗?

复杂性摘要

让 n 为查询字符串的长度.

Complexity summary

Let n be the length of the query string.

构建后缀树: O(n)
建立重复子串的最大堆: [3]
- 识别重复的子字符串(即内部节点)并将其存储在数组中: O(n)
- 堆数组: O(n)
- Building the suffix tree : O(n)
- Building the Max-heap of repeated substrings: [3]
  - Identifying the repeated substrings (ie. internal nodes) and storing them in an array: O(n)
  - Heapify the array: O(n)
  因此，最坏情况的总体复杂度是上述总和，并且是 O(n².log(n)).
  
  Hence the overall worst case complexity is the sum of the above and is O(n².log(n)).
  
  我在上面做了算法...因此它不是次优的，如果您足够勇敢，则可以遍历本文描述了线性时间算法！无论如何，后缀树是解决此问题的关键，因此我建议您仔细研究它们.
  
  I made the algorithm above... Hence it is suboptimal, if you are brave enough, you can go through this paper that describes a linear time algorithm! In any case, Suffix trees are a key to this problem so I suggest you study them thoroughly.
  
  [1] :警告，重复的子字符串可能会部分重叠！
  
  [1]: Warning, repeated substrings may partially overlap!
  
  [2] :实际上，最坏情况下的复杂性要比这个非常幼稚的上限要好，但是我不知道如何证明这一点(还好吗?！).例如，如果存在 n-2 个内部节点，则意味着原始字符串由出现了相同字符的 n 个组成.在这种情况下，我们检查的第一个子字符串是match =>，它是O(n.log(n)).
  
  [2]: Actually, the worst case complexity is better than this very naive upper bound but I don't know how to prove it (yet?!). For example, if there were n - 2 internal nodes, that would mean that the original string consists of n occurrences of the same character. In that case, the first substring we check is a match => it's O(n.log(n)).
  
  [3] :如果我们用常规排序方式( O(n.log(n)))替换堆构造，则最后的比较步骤将在 O(n²)而不是 O(n².log(n)) ...降低了 O(n.log(n))(由于排序步骤)和 O(n²).
  
  [3]: If we replace the heap construction by a regular sort (O(n.log(n))), the final comparison step runs in O(n²) instead of O(n².log(n)) ... Taking down the overall complexity between O(n.log(n)) (due to the sorting step) and O(n²).
  
  这篇关于找出是否存在两个相邻的相同子字符串的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

找出是否存在两个相邻的相同子字符串 [英] Finding out whether there exist two identical substrings one next to another

问题描述

推荐答案

复杂性摘要

Complexity summary

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

找出是否存在两个相邻的相同子字符串 [英] Finding out whether there exist two identical substrings one next to another

问题描述

推荐答案

复杂性摘要

Complexity summary

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭