节点的一个后缀树最大和最小数 [英] Maximum and minimum number of nodes in a suffix tree

查看:164
本文介绍了节点的一个后缀树最大和最小数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

哪些节点在一个后缀树的最大和最小数目?我怎么能证明呢?

What are the maximum and minimum number of nodes in a suffix tree? And how can I prove it?

推荐答案

假设 N 字符长,节点的最小数量的输入文本,包括根节点和所有的叶子节点,是 N + 1 ,节点的最大数量,包括根和叶,是 2N-1

Assuming an input text of N characters in length, the minimum number of nodes, including the root node and all leaf nodes, is N+1, the maximum number of nodes, including the root and leaves, is 2N-1.

证明最小的:都必须有每个后缀中的至少一个叶子结点,并有 N 后缀。不必有任何内在的节点,例如:如果文本是一个序列的独特符号, ABC $ ,没有分支,在生成后缀树,因此没有内在的节点

Proof of minimum: There must be at least one leaf node for every suffix, and there are N suffixes. There need not be any inner nodes, example: if the text is a sequence of unique symbols, abc$, there are no branches, hence no inner nodes in the resulting suffix tree:

因此​​,最低为 N 树叶, 0 内部节点,而 1 根节点,总之 N + 1 节点。

Hence the minimum is N leaves, 0 inner nodes, and 1 root node, in sum N+1 nodes.

证明最大:叶节点的数量不能超过 N时,因为叶节点就是一个后缀结束,你不能比 N 不同的后缀长度 N 的字符串有更多的。 (事实上​​,你总是有完全 N 不同的后缀,因此 N 叶节点完全吻合。)的根节点始终是正是 1 ,所以问题是什么是内部节点的最大数量。每一个内部节点引入了树的一个分支(因为后缀树的内部节点至少有2名儿童)。每一个新的分支必须最终导致至少一个额外的叶节点,所以如果你有 K 内部节点,必须有至少 K + 1 叶节点和根节点的presence至少需要一个额外的叶(除非树为空)。但叶节点的数目由 N ,所以内部节点的最大数量由界N-2 界。这会产生完全 N 树叶, 1 根,最大的N-2 内部节点,共2N-1

Proof of maximum: The number of leaf nodes can never be larger than N, because a leaf node is where a suffix ends, and you can't have more than N distinct suffixes in a string of length N. (In fact, you always have exactly N distinct suffixes, hence N leaf nodes exactly.) The root node is always exactly 1, so the question is what is the maximum number of inner nodes. Every inner node introduces a branch in the tree (because inner nodes of a suffix tree have at least 2 children). Each new branch must eventually lead to at least one extra leaf node, so if you have K inner nodes, there must be at least K+1 leaf nodes, and the presence of the root node requires at least one additional leaf (unless the tree is empty). But the number of leaf nodes is bounded by N, so the maximum number of inner nodes is bounded by N-2. This yields exactly N leaves, 1 root, and a maximum of N-2 inner nodes, 2N-1 in total.

要看到,这不仅是一个理论上限,但一些后缀树居然达到这个最大,可以考虑,例如,一个字符串只有一个重复的字符:'AAA $。确认后缀树为此有7个节点(包括根和叶):

To see that this is not only a theoretical upper bound, but some suffix trees actually reach this maximum, consider as an example a string with just one repeated character: 'aaa$'. Confirm that the suffix tree for this has 7 nodes (including root and leaves):

摘要:作为明显的,唯一真正的变量是内节点的数量;在 1 根和叶的数量是恒定的, N 所有后缀的树木,而内部节点的数量变化在 0 N-2

Summary: As evident, the only real variable is the number of inner nodes; the number of roots and leaves is constant at 1 and N for all suffix trees, while the number of inner nodes varies between 0 and N-2.

这篇关于节点的一个后缀树最大和最小数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆