什么是找到一个子串出现的所有的最快方法? [英] What is the fastest way to find all occurrences of a substring?

查看:173
本文介绍了什么是找到一个子串出现的所有的最快方法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这纯粹是出于好奇。我浏览通过比较各种字符串搜索算法的文章,发现他们都旨在找到一个匹配子字符串。这让我思考......如果我想找到一个子串出现的所有?

This is purely out of curiosity. I was browsing through an article comparing various string search algorithms and noticed they were all designed to find the first matching substring. This got me thinking... What if I wanted to find all occurrences of a substring?

我敢肯定,我可以创建一个使用KMP或BM的变体和倾倒每发现出现到一个数组的循环,但这似乎很难像它会是最快的。

I'm sure I could create a loop that used a variant of KMP or BM and dumped each found occurrence into an array but this hardly seems like it would be the fastest.

难道不是分而治之算法是优越?

Wouldn't a divide and conquer algorithm be superior?

比如让说你寻找一个字符串abbcacabbcabcacbccbabc序列ABC。

For instance lets say your looking for the sequence "abc" in a string "abbcacabbcabcacbccbabc".

  1. 在第一遍找到的第一个字符的所有事件和存储他们的位置。
  2. 在每个额外的传球使用来自preceding传球的位置找到下一个字符的所有匹配,降低了考生的下传与每个迭代。

考虑到与我想出了这个想法容易我假设已经有人想出了它,并在30年前后,它提高了。

Considering the ease with which I came up with this idea I assume someone already came up with it and improved upon it 30 years ago.

推荐答案

请参阅后缀阵列

应用程序

字符串的后缀阵列可   用作索引来快速定位   在每次发生一个子   该字符串。发现每发生   子串的相当于   发现每个后缀开头   子串。归功于   按字典序,这些   后缀将分组在   后缀数组,可以发现   有效地与一个二进制搜索。如果   直截了当地实施,这   二进制搜索需要O(mlogn)时,   其中m是的长度   子。为了避免重做   比较,额外的数据结构   提供有关最长信息   普通$ P $后缀pfixes(LCP材料)是   构造,使O(M + LOGN)搜索   时间。

The suffix array of a string can be used as an index to quickly locate every occurrence of a substring within the string. Finding every occurrence of the substring is equivalent to finding every suffix that begins with the substring. Thanks to the lexicographical ordering, these suffixes will be grouped together in the suffix array, and can be found efficiently with a binary search. If implemented straightforwardly, this binary search takes O(mlogn) time, where m is the length of the substring. To avoid redoing comparisons, extra data structures giving information about the longest common prefixes (LCPs) of suffixes are constructed, giving O(m + logn) search time.

这篇关于什么是找到一个子串出现的所有的最快方法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆