B树与二叉树 [英] B trees vs binary trees

查看:123
本文介绍了B树与二叉树的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果我要用b树实现内存中(RAM)搜索操作,那么与二叉树相比,在缓存或其他一些效果方面会更好吗?

If I am implementing in-memory(RAM) search operation with b trees, then would it be better in terms of caching or some other effects when compared with binary trees?

我所知道的是-

binary search tress---O(log n)
btrees ---------------O(c log n)

各种博客上对此都有很多讨论.

there was a lot of discussion regarding that on various blogs.

推荐答案

算法复杂度相同,因为O(log b n)= O(c log n)= O(log n)但是隐藏在big-O表示法中的常量因素可能会因实现方式和硬件的不同而有显着差异.

Algorithmic complexity is the same, since O(logb n) = O(c log n) = O(log n) but the constant factors, which are hidden in big-O notation, could vary noticeably, depending on implementation and hardware.

B树是为大容量硬盘设计的,它们具有很大的访问时间(将磁头移动到适当的位置),然后才能读取整个物理扇区.使B树节点与扇区一样大,可以最大程度地减少访问次数,并使每次读取操作中的有用数据最大化.

B-trees were designed for platter hard disks, which have a large access time (moving the head into position) after which an entire physical sector is read. Making the B-tree nodes as large as the sector minimizes the number of access times and maximizes the useful data out of each read operation.

但是,如果您的内存不足,则访问时间可以忽略不计,因此更好的比较是对算法访问的单个单词的数量进行计数.

But if you are working out of memory you have a negligible access time, therefore a better comparison is to count the number of single words accessed by your algorithm.

例如,让我们计划一个数据结构,以存储2个 20 密钥,每个密钥1个字,在32位计算机上总共存储4MiB的原始数据.

For example, let's plan a data structure to store 220 keys of 1 word each, for a total of 4MiB of raw data on a 32bit machine.

一个二叉搜索树将有2 20 个节点,每个节点拥有一个键和两个指针(3个单词).深度将为log 2 (2 20 )=20.平均搜索将必须从其路径中的每个节点(从根)读取键和指针之一一路下降= 40个字.

A binary search tree will have 220 nodes, each holding one key and two pointers (3 words). Depth will be log2(220) = 20. The average search will have to read the key and one of the pointers from each node in its path, from the root all the way down = 40 words.

用于硬盘的B树将具有4kB节点.每个节点可以在内部存储为键和指针对的排序数组,在​​其中256和512之间.平均搜索会是什么样子?考虑平均3/4填充,每个节点将包含384个条目,并且其内部二进制搜索将必须平均访问log 2 (384)= 5.95键.平均深度将为log 384 (2 20 )= 2.33,因此我们的搜索平均需要阅读2.33乘以5.95键,或大约 14个单词.

A B-tree made for hard disks will have 4kB nodes. Each node could be stored internally as a sorted array of key and pointer couples, between 256 and 512 of them. What is the average search going to look like? Considering an average 3/4 fill, each node will contain 384 entries, and its internal binary search will have to visit on average log2(384) = 5.95 keys. The average depth will be log384(220) = 2.33, so our search will have to read on average 2.33 times 5.95 keys, or about 14 words.

在低扇出(分支因子)B树的情况下,每个节点持有16至32个键,平均填充将为24个键,平均深度为log 24 ( 2 20 )= 4.36,在每个节点中的二进制搜索将进行log 2 (24)= 4.58比较,并且总体平均搜索将不得不读取大约 20个字.

In the case of a low-fanout (branching factor) B-tree, with each node holding between 16 and 32 keys, the average fill will be 24 keys, the average depth log24(220) = 4.36, the binary search in each node will make log2(24) = 4.58 comparisons, and the overall average search will have to read about 20 words.

请记住,最后两个数据结构比二叉树获得更好的结果,因为它们在修改后优化了读取操作.要将密钥插入这些B树之一中,您平均必须重写整个384个字或24个字的节点(如果不超过一个),而在二叉树的情况下,写操作仍只需要最多可以输入40个字.

Keep in mind that the last two data structures achieve a better result than binary trees because they optimize read operations over modifications. To insert a key into one of these B-trees you would have to rewrite on average an entire 384-word or 24-word node, if not more than one, while in the binary-tree case a write operation would still only need to touch up to 40 words.

(以前我做错了.感谢@virco和@Groo指出了我在评论中的错误.)

无论如何,似乎只有低扇出在实践中比二叉树表现更好.

In any case, it seems that memory-only B-trees with a low fanout appear to perform better than binary trees in practice.

32个密钥似乎是一个不错的选择.许多较新的语言和库都使用32键B树作为内置数据结构,作为哈希表和数组的替代品或替代品.这种用法是由 Clojure 和其他功能语言所带头的,但后来被更主流的语言(如Javascript)所采用,最近对不可变数据结构的关注(例如 Immutable.js )

32 keys per node in particular seems to be a sweet spot for current architectures, both 32- and 64-bit. Many newer languages and libraries are using 32-key B-trees as a built-in data structure, alongside or as a replacement for hash tables and arrays. This usage was spearheaded by Clojure and other functional languages, but was subsequently taken up by more mainstream languages such as Javascript, with the recent focus on immutable data structures (eg. Immutable.js)

不仅可以通过计算从内存中读取的字数来说明此结果,而且还可以缓存未命中,这是导致CPU停顿并等待RAM的读取操作.如果缓存体系结构一次可以获取包含整个B树节点的RAM块,那么我们将获得与成功用于基于磁盘的大容量存储相同的优化.

This result may be explained not only by counting the number of words being read from memory, but the cache misses too, which are read operations that cause the CPU to stall and wait for the RAM. If the caching architecture can fetch chunks of RAM that contain an entire B-tree node at a time, we get the same optimization that has been used sucessfully for disk-based mass storage.

对于经过硬盘优化的数据结构,我们将使用节点大小与物理磁盘扇区一样大的B树,以最大程度地减少磁盘访问时间.在这种情况下,我们使用B树,其节点的大小与3级缓存对RAM进行的读取操作一样大,以最大程度地减少缓存未命中.

In the case of hard-disk optimized data structures, we would use B-trees with nodes as large as the physical disk sector, to minimize disk access times. In this case, we are using B-trees with nodes as large as the read operation that is performed by the Level 3 cache against the RAM, to minimize cache misses.

这篇关于B树与二叉树的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆