在基数树/帕特里夏·特里preFIX搜索 [英] Prefix search in a radix tree/patricia trie

查看:184
本文介绍了在基数树/帕特里夏·特里preFIX搜索的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在实施一个基数树/帕特里夏·特里(无论你怎么称呼它)。我想在一个明显不足的硬件使用它的preFIX搜索的字典中。它应该工作,或多或少是自动完成,我。即示字的一个列表的类型化preFIX匹配

I'm currently implementing a radix tree/patricia trie (whatever you want to call it). I want to use it for prefix searches in a dictionary on a severely underpowered piece of hardware. It's supposed to work more or less like auto-completion, i. e. showing a list of words that the typed prefix matches.

我的实现是基于对本文,但$ C $Ç其中不包括preFIX搜索,虽然作者说:

My implementation is based on this article, but the code therein doesn't include prefix searches, though the author says:

[...]说要列举所有的有一个共同的preFIXAB键节点。您可以执行深度优先搜索开始的根,停止只要您遇到回边缘。

[...] Say you want to enumerate all the nodes that have keys with a common prefix "AB". You can perform a depth first search starting at that root, stopping whenever you encounter back edges.

不过,我看不出这应该是工作。例如,如果我从这些话建立一个基数树:

But I don't see how that is supposed to work. For example, if I build a radix tree from these words:

疾病
  虚
  想象
  想象
  仿
  immediate
  立即
  巨大的
  在

illness
imaginary
imagination
imagine
imitation
immediate
immediately
immense
in

我会得到完全相同的最佳匹配为prefixesi和中,这样似乎很难我收集所有匹配的话只是通过从最佳匹配遍历树。

I will get the exact same "best match" for the prefixes "i" and "in" so that it seems difficult to me to gather all matching words just by traversing the tree from that best match.

此外,还有一个基数树执行用Java 的已实现的一个在preFIX搜索<一href="http://$c$c.google.com/p/radixtree/source/browse/trunk/RadixTree/src/ds/tree/RadixTreeImpl.java"相对=nofollow> RadixTreeImpl.java 。这code明确检查的所有节点(从某一节点开始)为preFIX比赛 - 它实际上比较字节

Additionally, there is a radix tree implementation in Java that has an implemented prefix search in RadixTreeImpl.java. That code explicitly checks all nodes (starting from a certain node) for a prefix match - it actually compares bytes.

任何人都可以点我在执行上基数树木preFIX搜索的详细说明?是Java实现中使用的唯一方法算法来做到这一点?

Can anyone point me to a detailed description on implementing a prefix search on radix trees? Is the algorithm used in the Java implementation the only way to do it?

推荐答案

想想您的线索连接codeS。在每一个节点上,你有带你到该节点的路径,所以在你的榜样,你开始与LAMBDA; (这是一个资本LAMBDA,种这个希腊字体很烂),相当于空字符串的根节点。 &LAMBDA;有孩子每个字母使用,所以在你的数据集,你有一个分支,为我。

Think about what your trie encodes. At each node, you have the path that lead you to that node, so in your example, you start at Λ (that's a capital Lambda, this greek font kind of sucks) the root node corresponding to an empty string. Λ has children for each letter used, so in your data set, you have one branch, for "i".

  • &LAMBDA;
  • &LAMBDA;&RARR;我

目前的i的节点,有两个孩子,一个用于m和一个用于n个。下一个字母是N,让你拿去,

At the "i" node, there are two children, one for "m" and one for "n". The next letter is "n", so you take that,

  • &LAMBDA;&RARR;我与RARR;N

和自启动的唯一一句话我,N,在你的数据集的的中,有没有孩子从N。这是一个比赛。

and since the only word that starts "i","n" in your data set is "in", there are no children from "n". That's a match.

现在,让我们说,而不必中的数据集,有infindibulum。 (什么SF我引用留作练习。)你还是会去的N节点以同样的方式,但如果你得到一个字母为Q,就知道这个词不会出现在你的数据集可言,因为没有Q分支。在这一点上,你说没关系,不匹配。 (也许你再开始添加的话,也许不是,这取决于应用程序。)

Now, let's say the data set, instead of having "in", had "infindibulum". (What SF I'm referencing is left as an exercise.) You'd still get to the "n" node the same way, but then if the next letter you get is "q", you know the word doesn't appear in your data set at all, because there's no "q" branch. At that point, you say "okay, no match." (Maybe you then start adding the word, maybe not, depending on the application.)

但如果下一个字母F,就可以继续下去。您可以短路只要有一点手艺,但:一旦你达到了重presents的唯一路径的一个节点,你可以挂的整串的关闭该节点。当你到这点,你就知道该字符串的其余部分的必须的是findibulum,让您使用了preFIX匹配整个字符串,并将其返回。

But if the next letter is "f", you can keep going. You can short circuit that with a little craft, though: once you reach a node that represents a unique path, you can hang the whole string off that node. When you get to that node, you know that the rest of the string must be "findibulum", so you've used the prefix to match the whole string, and return it.

您如何您使用的?在大量的非UNIX命令除preters,像老VAX DCL,你可以使用一个命令的任何唯一preFIX。因此,相当于 LS(1)​​的是目录,但没有其他命令,开始与DIR,所以你可以键入 DIR 键,这是一样好做整个单词。如果你不记得正确的命令,你可以只输入D,和打(我认为)ESC;在DCL CLI将返回您的所有的与 D ,它可以搜索极快的启动命令。

How your you use that? in a lot of non-UNIX command interpreters, like the old VAX DCL, you could use any unique prefix of a command. So, the equivalent of ls(1) was DIRECTORY, but no other command started with DIR, so you could type DIR and that was as good as doing the whole word. If you couldn't remember the correct command, you could type just 'D', and hit (I think) ESC; the DCL CLI would return you all the commands that started with D, which it could search extremely fast.

这篇关于在基数树/帕特里夏·特里preFIX搜索的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆