T树或B树 [英] T-Tree or B-Tree

查看:363
本文介绍了T树或B树的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

本文中介绍了T-tree算法 T * -Tree是对T-tree的改进,可以更好地利用查询操作,包括范围查询,并且包含T-tree的所有其他良好功能.
本文的"T *-树:实时应用程序的主内存数据库索引结构"中描述了该算法.
根据该研究论文,当数据集适合内存时,T树比B树/B +树快. 我实现了这些论文中描述的T-Tree/T * Tree,并将其性能与B-tree/B + tree进行了比较,但在所有测试案例中,B-tree/B + tree的性能均优于T-Tree/T * Tree. (插入,删除,搜索).
我读到T-Tree是内存数据库的高效索引结构,它被Oracle TimesTen使用.但是我的结果并没有显示出来.
如果有人知道原因或对此有任何评论,很高兴收到她(或他)的来信.

T-tree algorithm is described in this paper And T*-Tree is an improvement from T-tree for better use of query operations, including range queries and which contains all other good features of T-tree.
This algorithm is described in this paper "T*-tree: A Main Memory Database Index Structure for Real-Time Applications".
According to this research paper, T-Tree is faster than B-tree/B+tree when datasets fit in the memory. I implemented T-Tree/T*Tree as they described in these papers and compared the performance with B-tree/B+tree, but B-tree/B+tree perform better than T-Tree/T*Tree in all test cases (insertion, deletion, searching).
I read that T-Tree is an efficient index structure for in-memory database, and it used by Oracle TimesTen. But my results did not show that.
If anyone may know the reason or have any comment about that, it will be great to hear from her (or him).

推荐答案

T树不是AVL树或B树一样的基本数据结构.它们只是平衡二叉树的黑客版本,因此,可能有也可能没有利基应用程序在这些应用程序中提供不错的性能.

T-Trees are not a fundamental data structure in the same sense that AVL trees or B-trees are. They are just a hacked version of balanced binary trees and as such there may or may not be niche applications where they offer decent performance.

在这个时代,由于预期的块/页面传输计数和高速缓存局部性,它们由于局部性差而必将遭受可怕的折磨.后者很明显,因为在除了最后一个搜索的所有节点访问中,都将对照搜索键检查边界值-其余所有页面都被分页或缓存为空.

In this day and age they are bound to suffer horribly because of their poor locality, both in the sense of expected block/page transfer counts and in the sense of cache locality. The latter is evident since in all node accesses of a search except for the very last one, only the boundary values will be checked against the search key - all the rest is paged in or cached for nought.

将其与一般的B树(尤其是B +树)的出色访问位置进行比较(更不用说明确考虑了内存性能特性而设计的对缓存不了解和对缓存敏感的版本了.)

Compare this to the excellent access locality of B-trees in general and B+trees in particular (not to mention cache-oblivious and cache-conscious versions that were designed explicitly with memory performance charactistics in mind).

再平衡也存在类似的问题.在B树世界中,已经开发并完善了许多变体-从B +和Blink开始,以实现所需的摊销性能特征,包括并发(锁定/锁存)或不存在这些方面.因此,大多数情况下,您可以简单地出去寻找适合您性能表现的B树形变化-或使用简单的经典B +树形,并确保获得不错的结果.

Similar problems exist with the rebalancing. In the B-tree world many variations - starting with B+ and Blink - have been developed and perfected in order to achieve desired amortised performance characteristics, including aspects like concurrency (locking/latching) or the absence thereof. So most of the time you can simply go out and find a B-tree variation that fits your performance profile - or use the simple classic B+tree and be certain of decent results.

T树比可比的B树要复杂得多,并且考虑到单层商品硬件的时代,它们似乎无法提供总体上 的性能.记忆的层次结构"已经消失了数十年.硬盘不仅是新的内存,反之亦然,而主内存现在是新的硬盘. IE.即使没有NUMA,将数据从主内存带入高速缓存层次结构的成本也很高,以至于要最大程度地减少页面传输,而这恰恰是B树及其变体所做的,而T树却没有.与处理器核心更接近的是,缓存行访问/传输的数量很重要,但是情况保持不变.

T-trees are more complicated than comparable B-trees and it seems that they have nothing to offer in the way of performance in general, given that the times of commodity hardware with a single-level memory 'hierarchy' have been gone for decades. Not only is the hard disk the new memory, the converse is also true and main memory is the new hard disk now. I.e. even without NUMA the cost of bringing data from main memory into the cache hierarchy is so high that it pays to minimise page transfers - which is precisely what B-trees and their variations do and the T-tree doesn't. Closer to the processor core it's the number of cache line accesses/transfers that matters but the picture remains the same.

实际上,如果您采用二进制搜索(这是最佳选择)的想法,并考虑以与内存层次结构(高速缓存)良好配合的方式安排搜索键的方式,那么您总是会得到一些看起来像这样的东西简直像B树一样……

In fact, if you take the idea of binary search - which is provably optimal - and think about ways of arranging the search keys in a manner that plays well with memory hierarchies (caches) then you invariably end up with something that looks uncannily like a B-tree...

如果您对性能进行编程,则会发现获胜者几乎总是位于排序后的数组,B树和哈希之间的三角形中的某个位置.即使平衡的二叉树只有在其相对较差的性能因其他考虑而退居次要并且键数相当小(即不超过一百万)的情况下才具有竞争力.

If you program for performance then you'll find that winners are almost always located somewhere in the triangle between sorted arrays, B-trees and hashing. Even balanced binary trees are only competitive if their comparatively poor performance takes the back seat in the face of other considerations and key counts are fairly small, i.e. not more than a couple million.

这篇关于T树或B树的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆