关于PATRICIA的混乱 [英] Confusion regarding PATRICIA

查看:114
本文介绍了关于PATRICIA的混乱的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

根据 libstdc ++文档的要点3和4, PATRICIA尝试有两种类型的节点:

According to points 3 and 4 of libstdc++ documentation, PATRICIA tries have two types of nodes:


(PATRICIA)类似于一棵树,但是具有以下
差异:

A (PATRICIA) trie is similar to a tree, but with the following differences:


  1. 它将密钥显式视为元素序列。特里可以将字符串视为字符序列;一个trie可以查看一个
    数字作为一个比特序列。

  1. It explicitly views keys as a sequence of elements. E.g., a trie can view a string as a sequence of characters; a trie can view a number as a sequence of bits.

它不是(必须)二进制。每个节点都有扇出n + 1,其中n是不同元素的数量。

It is not (necessarily) binary. Each node has fan-out n + 1, where n is the number of distinct elements.

它只存储在叶节点上的值。

It stores values only at leaf nodes.

内部节点具有A)每个至少有两个子节点的属性,B)每个节点与任何
后代共享相同的前缀。

Internal nodes have the properties that A) each has at least two children, and B) each shares the same prefix with any of its descendant.


我一直在阅读的书(C中的算法,Robert Sedgewick的第1-4部分)似乎描述了一个仅使用n个节点存储n个值的PATRICIA特技,使用内部节点存储值:

The book I've been reading (Algorithms in C, Parts 1-4 by Robert Sedgewick) seems to describe a PATRICIA trie storing n values with only n nodes, using internal nodes to store values:


像DST一样,patricia尝试允许在一棵树中搜索N个键,只需
N个节点。 ...我们通过另一个简单的设备避免外部节点:我们
在内部节点中存储数据,并用
链接替换到外部节点的链接,该链接指向特技中正确的内部节点

Like DSTs, patricia tries allow search for N keys in a tree with just N nodes. ... we avoid external nodes via another simple device: we store data in internal nodes and replace links to external nodes with links that point back upwards to the correct internal node in the trie

我想我很关心我的资源的准确性(编辑:...这些不是唯一的两个参考的争议在这个话题)。

I guess I'm concerned about the accuracy of my resources (edit: ... and these aren't the only two references in dispute on this topic). Which of these references are correct?

推荐答案

我继续搜索过去信誉来源的具体定义,以确认我怀疑我正在写信来提供我的发现。也许最重要的是定义PATRICIA的官方文件,由Morris Morris在1968年10月出版的ACM杂志:

I continued to search for a specific definition from past reputable sources to confirm what I had suspected, and I'm writing to provide my findings. Perhaps the most significant is the official paper defining PATRICIA, published by DR Morrison in October 1968s "Journal of the ACM":


PATRICIA从图书馆自动机[3]等研究。 ...
在这个演变的早期,决定字母表应该是
限于二进制。一个强烈影响这个
决定的定理是另一种形式归因于欧拉的定理。定理
表示如果字母是二进制的,那么分支数是
,恰好比结尾数少一个。推论指出,随着图书馆的增长,
每个新的一端都带有
一个新的分支,每个分支都有两个出口。这些
事实在索引的存储分配中非常有用。它们
意味着所需的总存储量完全由
的数量决定,所有实际需要的存储空间将被实际使用。

PATRICIA evolved from "A Library Automaton" [3] and other studies. ... Early in this evolution it was decided that the alphabet should be restricted to a binary one. A theorem which strongly influenced this decision is one which, in another form, is due to Euler. The theorem states that if the alphabet is binary, then the number of branches is exactly one less than the number of ends. Corollaries state that as the library grows, each new end brings into the library with it exactly one new branch, and each branch has exactly two exits. These facts are very useful in the allocation of storage for the index. They imply that the total storage required is completely determined by the number of ends, and all of the storage required will actually be used.

这肯定违反了libstdc ++引用的第2和第3点。在本文中还有进一步的证据,例如具体的算法细节,但上面的引文应该足够了。

This certainly contradicts points 2 and 3 of the libstdc++ reference. There's further evidence in this paper, such as specific algorithm details, but the quote above should suffice.

似乎没有任何偏离官方描述Sedgewick引用,但是。基于此,libstdc ++资源肯定比Sedgewick资源有效。

There don't appear to be any deviations from the official description in the Sedgewick quote, however. Based on that, the libstdc++ resource is certainly less valid than the Sedgewick resource.

这篇关于关于PATRICIA的混乱的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆