特里vs B +树 [英] Trie vs B+ tree
问题描述
Trie和B +树如何比较按字典顺序排序的字符串的索引(大约数十亿)?
它也应该支持范围查询。
How does Trie and B+ tree compare for indexing lexicographically sorted strings [on the order some billions]? It should support range queries as well.
来自perf。
From perf. as well as implementation complexity point of view.
推荐答案
我会说这取决于您所说的 Range em>。
I would say it depends on what you mean by Range.
如果您的范围表示为所有以开头的单词,则为 Trie $ c $我会说c>是正确的选择。另一方面,
Trie
并不用于 XX和ZZ之间的所有单词。
If your range is expressed as All words beginning by, then a Trie
is the right choice I'd say. On the other hand, Trie
are not meant for requests like All words between XX and ZZ.
请注意, B +树
的分支因子会影响其性能(中间节点的数量)。如果 h
是树的高度,则n max ~~ b h 。因此h ~~ log(n max )/ log(b)。
Note that the branching factor of the B+ Tree
affects its performance (the number of intermediary nodes). If h
is the height of the tree, then nmax ~~ bh. Therefore h ~~ log(nmax) / log(b).
其中 n = 1 000 000 000
和 b = 100
,我们有 h ~~ 5
。因此,这意味着从根到叶只有5个指针取消引用。它比 Trie
更易于缓存。
With n = 1 000 000 000
and b = 100
, we have h ~~ 5
. Therefore it means only 5 pointer dereferencing for going from the root to the leaf. It's more cache-friendly than a Trie
.
最后, B + Tree $相对于
Trie
,c $ c>的实现难度更大:在 Red-Black Tree
级别复杂。
Finally, B+ Tree
is admittedly more difficult to implement than a Trie
: it's more on a Red-Black Tree
level of complexity.
这篇关于特里vs B +树的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!