节点的一个后缀树最大和最小数 [英] Maximum and minimum number of nodes in a suffix tree
问题描述
哪些节点在一个后缀树的最大和最小数目?我怎么能证明呢?
What are the maximum and minimum number of nodes in a suffix tree? And how can I prove it?
推荐答案
假设 N
字符长,节点的最小数量的输入文本,包括根节点和所有的叶子节点,是 N + 1
,节点的最大数量,包括根和叶,是 2N-1
。
Assuming an input text of N
characters in length, the minimum number of nodes, including the root node and all leaf nodes, is N+1
, the maximum number of nodes, including the root and leaves, is 2N-1
.
证明最小的:都必须有每个后缀中的至少一个叶子结点,并有 N
后缀。不必有任何内在的节点,例如:如果文本是一个序列的独特符号, ABC $
,没有分支,在生成后缀树,因此没有内在的节点
Proof of minimum: There must be at least one leaf node for every suffix, and there are N
suffixes. There need not be any inner nodes, example: if the text is a sequence of unique symbols, abc$
, there are no branches, hence no inner nodes in the resulting suffix tree:
因此,最低为 N
树叶, 0
内部节点,而 1
根节点,总之 N + 1
节点。
Hence the minimum is N
leaves, 0
inner nodes, and 1
root node, in sum N+1
nodes.
证明最大:叶节点的数量不能超过 N时
,因为叶节点就是一个后缀结束,你不能比 N
不同的后缀长度 N
的字符串有更多的。 (事实上,你总是有完全 N
不同的后缀,因此 N
叶节点完全吻合。)的根节点始终是正是 1
,所以问题是什么是内部节点的最大数量。每一个内部节点引入了树的一个分支(因为后缀树的内部节点至少有2名儿童)。每一个新的分支必须最终导致至少一个额外的叶节点,所以如果你有 K
内部节点,必须有至少 K + 1
叶节点和根节点的presence至少需要一个额外的叶(除非树为空)。但叶节点的数目由 N
,所以内部节点的最大数量由界N-2
界。这会产生完全 N
树叶, 1
根,最大的N-2
内部节点,共2N-1
。
Proof of maximum: The number of leaf nodes can never be larger than N
, because a leaf node is where a suffix ends, and you can't have more than N
distinct suffixes in a string of length N
. (In fact, you always have exactly N
distinct suffixes, hence N
leaf nodes exactly.) The root node is always exactly 1
, so the question is what is the maximum number of inner nodes. Every inner node introduces a branch in the tree (because inner nodes of a suffix tree have at least 2 children). Each new branch must eventually lead to at least one extra leaf node, so if you have K
inner nodes, there must be at least K+1
leaf nodes, and the presence of the root node requires at least one additional leaf (unless the tree is empty). But the number of leaf nodes is bounded by N
, so the maximum number of inner nodes is bounded by N-2
. This yields exactly N
leaves, 1
root, and a maximum of N-2
inner nodes, 2N-1
in total.
要看到,这不仅是一个理论上限,但一些后缀树居然达到这个最大,可以考虑,例如,一个字符串只有一个重复的字符:'AAA $。确认后缀树为此有7个节点(包括根和叶):
To see that this is not only a theoretical upper bound, but some suffix trees actually reach this maximum, consider as an example a string with just one repeated character: 'aaa$'. Confirm that the suffix tree for this has 7 nodes (including root and leaves):
摘要:作为明显的,唯一真正的变量是内节点的数量;在 1
根和叶的数量是恒定的, N
所有后缀的树木,而内部节点的数量变化在 0
和 N-2
。
Summary: As evident, the only real variable is the number of inner nodes; the number of roots and leaves is constant at 1
and N
for all suffix trees, while the number of inner nodes varies between 0
and N-2
.
这篇关于节点的一个后缀树最大和最小数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!