B树与哈希表 [英] B-Tree vs Hash Table

查看:115
本文介绍了B树与哈希表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在MySQL中,索引类型为b树,访问b树中的元素的时间为对数摊销时间O(log(n)).

In MySQL, an index type is a b-tree, and access an element in a b-tree is in logarithmic amortized time O(log(n)).

另一方面,在O(1)中访问哈希表中的元素.

On the other hand, accessing an element in a hash table is in O(1).

为什么不使用哈希表代替b树来访问数据库中的数据?

Why is a hash table not used instead of a b-tree in order to access data inside a database?

推荐答案

您只能通过哈希表中的主键访问元素. 这比使用树算法( O(1)而不是log(n) )要快,但是您不能选择范围( xy 之间的所有内容) . 树算法在Log(n)中支持此功能,而哈希索引可以导致全表扫描O(n). 同样,哈希索引的恒定开销通常会更大(在theta表示法中没有影响,但它仍然存在). 而且树形算法通常更易于维护,随数据增长,扩展等.

You can only access elements by their primary key in a hashtable. This is faster than with a tree algorithm (O(1) instead of log(n)), but you cannot select ranges (everything in between x and y). Tree algorithms support this in Log(n) whereas hash indexes can result in a full table scan O(n). Also the constant overhead of hash indexes is usually bigger (which is no factor in theta notation, but it still exists). Also tree algorithms are usually easier to maintain, grow with data, scale, etc.

哈希索引使用预定义的哈希大小,因此最终会得到一些存储对象的存储桶".这些对象又被循环,以在该分区中真正找到正确的对象.

Hash indexes work with pre-defined hash sizes, so you end up with some "buckets" where the objects are stored in. These objects are looped over again to really find the right one inside this partition.

因此,如果尺寸较小,则较小的元素会有很多开销,较大的尺寸会导致进一步的扫描.

So if you have small sizes you have a lot of overhead for small elements, big sizes result in further scanning.

今天,哈希表算法通常可以扩展,但是扩展效率很低.

Todays hash tables algorithms usually scale, but scaling can be inefficient.

确实有可扩展的哈希算法.不要问我这是怎么回事-这对我来说也是一个谜. AFAIK是从不容易重新哈希化的可伸缩复制演变而来的.

There are indeed scalable hashing algorithms. Don't ask me how that works - its a mystery to me too. AFAIK they evolved from scalable replication where re-hashing is not easy.

它称为 RUSH - R 复制 U nder S 可缩放的 H 灰化,因此这些算法称为RUSH算法.

Its called RUSH - Replication Under Scalable Hashing, and those algorithms are thus called RUSH algorithms.

但是,在某些情况下,与散列大小相比,您的索引超出了可容忍的大小,因此需要重新构建整个索引.通常这不是问题,但是对于巨大的数据库,这可能需要几天的时间.

However there may be a point where your index exceeds a tolerable size compared to your hash sizes and your entire index needs to be re-built. Usually this is not a problem, but for huge-huge-huge databases, this can take days.

树算法的权衡很小,它们几乎适用于所有用例,因此是默认的.

The trade off for tree algorithms is small and they are suitable for almost every use case and thus are default.

但是,如果您有一个非常精确的用例,并且确切知道将只需要什么,则可以利用哈希索引.

However if you have a very precise use case and you know exactly what and only what is going to be needed, you can take advantage of hashing indexes.

这篇关于B树与哈希表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆