特里(preFIX树)在Python [英] Trie (Prefix Tree) in Python

查看:227
本文介绍了特里(preFIX树)在Python的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我不知道这是不是询问有关算法的地方。但是,让我们看看,如果我得到任何答案...:)

I don't know if this is the place to ask about algorithms. But let's see if I get any answers ... :)

如果有什么不清楚我很高兴澄清的事情。

If anything is unclear I'm very happy to clarify things.

我只是实现了一个特里的蟒蛇。然而,一个位似乎比它应该(谁的人喜欢简单)更复杂。也许有人有过类似的问题?

I just implemented a Trie in python. However, one bit seemed to be more complicated than it ought to (as someone who loves simplicity). Perhaps someone has had a similar problem?

我的目标是通过存储一个子字典树的最大公共preFIX在其根部,以尽量减少节点数目。例如,如果我们有话计算器 stackbase stackbased ,然后树看起来是这样的:

My aim was to minimize the number of nodes by storing the largest common prefix of a sub-trie in its root. For example, if we had the words stackoverflow, stackbase and stackbased, then the tree would look something like this:

              [s]tack
[o]verflow ______/ \_______ [b]ase
                                  \___ [d]

请注意,一个仍然可以认为有一个字符(子节点的第一个)的边缘。

Note that one can still think of the edges having one character (the first one of the child node).

查找的-query实现起来很简单。 插入的并不难,但是较为复杂,比我想..:(

Find-query is simple to implement. Insertion is not hard, but somewhat more complex than I want.. :(

我的想法是插入钥匙一个接一个地(从空的线索开始),由第一搜索的要被插入的密钥k(的查找的(k))的,然后重新排列/拆分节点在局部地方查找,程序停止的地方。有变成为4的情况: (设K是我们要插入钥匙,和k'是节点,在此处搜索结束的键)

My idea was to insert the keys one after the other (starting from an empty trie), by first searching for the to-be-inserted key k (Find(k)), and then rearranging/splitting the nodes locally at the place where the find-procedure stops. There turn out to be 4 cases: (Let k be the key we want to insert, and k' be the key of the node, where the search ended)

  1. k是相同的K'
  2. k为k的正确的preFIX
  3. k就是k的正确的preFIX
  4. K和一些共同的preFIX,但没有的情况下,(1),(2)或(3)发生。

似乎每种情况都是唯一的,从而意味着Trie树的不同的修改。但是:是不是真的那么复杂?我失去了一些东西?有没有更好的方法?

It seems that each of the cases are unique and thus imply different modifications of the Trie. BUT: is it really that complex? Am I missing something? Is there a better approach?

感谢:)

推荐答案

一目了然,这听起来像你已经实现了一个的帕特里夏·特里。这种方法也被称为路径COM pression在一些文献。应该有该文件不属于ACM收费墙后面,其中将包括一插入算法的副本。

At a glance, it sounds like you've implemented a Patricia Trie. This approach also is called path compression in some of the literature. There should be copies of that paper that aren't behind the ACM paywall, which will include an insertion algorithm.

还有,你可能想看看另一个COM pression方法:层次COM pression。后面的路COM pression的想法是与有跳过算一个超级节点取代单一的子节点的字符串。背后级COM pression的想法是用一个超级节点与度数,上面写着更换全部或接近满子树有多少的关键节点去codeS数字。还有一个叫宽COM pression 3的方法,但我怕我的记忆失败,我,我找不到它具有快速的谷歌搜索的说明。

There's also another compression method you may want to look at: level compression. The idea behind path compression is to replace strings of single child nodes with a single super node that has a "skip" count. The idea behind level compression is to replace full or nearly full subtrees with a super node with a "degree" count that says how many digits of the key the node decodes. There's also a 3rd approach called width compression, but I'm afraid my memory fails me and I couldn't find a description of it with quick googling.

级COM pression可以缩短平均路径相当,但插入和删除算法变得非常复杂,因为他们需要管理线索节点作为类似于动态数组。对于正确的数据集,pssed级COM $ P $树木可以的快速的。从我记得,他们是第二个最快的方法,用于存储IP路由表,最快的是某种类型的哈希线索的。

Level compression can shorten the average path considerably, but insertion and removal algorithms get quite complicated as they need to manage the trie nodes as similarly to dynamic arrays. For the right data sets, level compressed trees can be fast. From what I remember, they're the 2nd fastest approach for storing IP routing tables, the fastest is some sort of hash trie.

这篇关于特里(preFIX树)在Python的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆