计数子的数量 [英] Counting the number of substrings
问题描述
我想问一下,如果有一个算法,用于计算一个字符串中的O(n)时间的子串离散occurencies的数量。
I would like to ask if there is an algorithm for counting the number of discrete occurencies of a substring in a string in O(n) time.
推荐答案
您可以建立在线性时间的文本后缀树。然后,搜索在后缀树中的子串;当你找到它,算叶节点的匹配节点下的数量。这是O(M + K)为长度为m的子串出现k次(添加的n术语用于构建后缀树)。或者,您可以使用$深度优先遍历树p $ pcalculate后代的数量为每个节点 - 这将给O(M)查询
You can build a suffix tree on the text in linear time. Then, search for your substring in the suffix tree; when you find it, count the number of leaf nodes beneath the matching node. This is O(m + k) for a substring of length m that appears k times (add an n term for building the suffix tree). Or, you can precalculate the number of descendants for each node in the tree using a depth-first traversal -- this will give O(m) queries.
有关大文章,后缀阵列往往比后缀树在实践中速度快,但寻找一个单一的长度-M串由O(M)下降到O(M登入)。在这种情况下,一个特定子串的所有匹配将显示为后缀阵列中的连续块,所以这个块的宽度是出现的次数。
For large texts, suffix arrays are often faster than suffix trees in practice, despite searching for a single length-m string dropping from O(m) to O(m log m). In this case, all occurrences of a particular substring will appear as a contiguous block in the suffix array, so the width of this block is the number of occurrences.
这篇关于计数子的数量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!