计数子的数量 [英] Counting the number of substrings

查看:98
本文介绍了计数子的数量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想问一下,如果有一个算法,用于计算一个字符串中的O(n)时间的子串离散occurencies的数量。

I would like to ask if there is an algorithm for counting the number of discrete occurencies of a substring in a string in O(n) time.

推荐答案

您可以建立在线性时间的文本后缀树。然后,搜索在后缀树中的子串;当你找到它,算叶节点的匹配节点下的数量。这是O(M + K)为长度为m的子串出现k次(添加的n术语用于构建后缀树)。或者,您可以使用$深度优先遍历树p $ pcalculate后代的数量为每个节点 - 这将给O(M)查询

You can build a suffix tree on the text in linear time. Then, search for your substring in the suffix tree; when you find it, count the number of leaf nodes beneath the matching node. This is O(m + k) for a substring of length m that appears k times (add an n term for building the suffix tree). Or, you can precalculate the number of descendants for each node in the tree using a depth-first traversal -- this will give O(m) queries.

有关大文章,后缀阵列往往比后缀树在实践中速度快,但寻找一个单一的长度-M串由O(M)下降到O(M登入)。在这种情况下,一个特定子串的所有匹配将显示为后缀阵列中的连续块,所以这个块的宽度是出现的次数。

For large texts, suffix arrays are often faster than suffix trees in practice, despite searching for a single length-m string dropping from O(m) to O(m log m). In this case, all occurrences of a particular substring will appear as a contiguous block in the suffix array, so the width of this block is the number of occurrences.

这篇关于计数子的数量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆