数据库索引B树和列表 [英] Database Indexes B-Trees and Lists

查看:181
本文介绍了数据库索引B树和列表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

任何人都可以解释为什么数据库倾向于使用b树索引而不是有序元素的链表。

Can anyone explain why databases tend to use b-tree indexes rather than a linked list of ordered elements.

我的想法是:在B + Tree(大多数数据库使用)上,非叶子节点是指向其他节点的指针集合。每个集合(节点)都是有序列表。叶节点是所有数据指针所在的位置,是数据指针簇的链表。

My thinking is this: On a B+ Tree (used by most databases), the none-leaf nodes are a collection of pointers to other nodes. Each collection (node) is a ordered list. The leaf nodes, which is where all the data pointers are, is a linked list of clusters of data pointers.

非叶​​节点仅用于查找目标数据指针所在的正确叶节点。因为叶节点就像一个链表,那么为什么不只是取消树元素而只是拥有链表。可以提供元数据,其给出每个叶节点集群的最小值和最大值,因此应用程序可以只读取元数据并找到数据指针所在的正确叶。

The non-leaf nodes are just used to find the correct leaf node in which your target data pointer lives. So as the leaf nodes are just like a linked list, then why not just do away with the tree elements and just have the linked list. Meta data can be provided which gives the minimum and maximum value of each leaf node cluster, so the application can just read the meta data and find the correct leaf where the data pointer lives.

为了清楚,用于搜索随机访问的有序列表的最有效的算法是二进制搜索,其性能为O(log n),与一棵b树。使用链表而不是树的好处是它们不需要平衡。

Just to be clear that the most efficent algorithm for searching an random accessed ordered list is an binary search which has a performance of O(log n) which is the same as a b-tree. The benifit of using a linked list rather than a tree is that they don't need to be ballanced.

这种结构是否可行。

推荐答案

经过一些研究和论文阅读后,我找到了答案。

After some research and paper reading I found the answer.

为了应付大笔金额对于如此数百万条记录的数据,必须将索引组织成簇。群集是磁盘上的一组连续扇区,可以快速读入内存。这些通常大约4096字节。

In order to cope with large amounts of data such a millions of records, indexes have to be organised into clusters. A cluster is a continuous group of sectors on a disk that can be read into memory quickly. These are usually about 4096 bytes long.

这些集群中的每一个都可以包含一堆索引,这些索引可以指向磁盘上的其他集群或数据。因此,如果我们有一个链表索引,索引的每个元素都将由一个集群中包含的索引集合组成(比如说100)。

Each one of these clusters can contain a bunch of indexes which can point to other clusters or data on a disk. So if we had a linked list index, each element of the index would be made up of the collection of indexes contained in a single cluster (say 100).

因此,当我们寻找特定记录时,我们如何知道它所在的群集。我们执行二进制搜索以找到有问题的集群[O(log n)]。

So, when we are looking for a specific record, how do we know which cluster it is on. We perform a binary search to find the cluster in question [O(log n)].

但是,要进行二分查找,我们需要知道每个聚类中值的范围,所以我们需要元数据来说明每个聚类的最小值和最大值。群集以及该群集的位置。这很棒。除非每个集群可以包含100个索引,并且我们的元数据也保存在单个集群上(为了速度),那么我们的元数据只能指向100个集群。

However, to do a binary search we need to know where the range of values in each clusters, so we need meta-data that says the min and max value of each cluster and where that cluster is. This is great. Except if each cluster can contain 100 indexes, and our meta data is also held on a single cluster (for speed) , then our meta data can only point to 100 clusters.

如果我们想要超过100个集群会发生什么。我们必须有两个元数据索引,每个索引指向100个集群(10 000个记录)。那还不够。让我们添加另一个元数据集群,现在我们可以访问1 000 000条记录。那么我们如何知道我们需要查询三个元数据集群中的哪一个才能找到我们的目标数据集群。我们可以搜索一个然后搜索另一个,但这不会扩展。所以我添加了另一个元元数据集群来指示我应该查询哪三个元数据集群以找到目标数据集群。现在我有了一棵树!

What happens if we want more than 100 clusters. We have to have two meta-data indexes, each pointing to 100 clusters (10 000 records). Well that’s not enough. Lets add another meta-data cluster and we can now access 1 000 000 records. So how do we know which one of the three meta-data clusters we need to query in order to find our target data cluster. We could search one then the other, but that doesn’t scale. So I add another meta-meta-data cluster to indicate which one of the three meta-data clusters I should query to find the target data cluster. Now I have a tree!

这就是数据库使用树的原因。它不是速度,而是索引的大小以及索引引用其他索引的需要。我上面描述的是B + Tree - 子节点包含对其他子节点或叶节点的引用,叶节点包含对磁盘上数据的引用。

So that’s why databases use trees. It’s not the speed it’s the size of the indexes and the need to have indexes referencing other indexes. What I have described above is a B+Tree – child nodes contain references to other child nodes or leaf nodes, and leaf nodes contain references to data on disk.

Phew!

这篇关于数据库索引B树和列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆