数据库如何在B树/ B +树中内部存储数据 [英] How Database stores data internally in B-Tree/B+Tree

查看:272
本文介绍了数据库如何在B树/ B +树中内部存储数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的问题是数据库如何存储数据以及如何在内部执行查询。



假设我们的表中有以下字段:


  1. ID

  2. 姓名

  3. 年龄

  4. 重量

  5. 经理

,我们查询 select * from Table1 age> 50 and weight< 100



我只是想知道如何在内部执行查询。



在此示例中,B-Tre / B + Tree的节点将包含什么?

解决方案

您选择的示例是单个树不能完成该作业(两个独立范围)的少数情况之一。



但是,我的工作进程电子书的第一章解释了B-Tree索引的内部工作原理: http://use-the-index-luke.com/anatomy/



编辑了解更多详情,



上面的查询可以通过三种可能的索引配置来支持:

    然后 WEIGHT (按此顺序)中的 b

    在这种情况下,查询将读取所有记录 WHERE AGE> 50 ,然后按 WEIGHT 过滤。


  1. c $ c> WEIGHT ,然后 AGE (另一个顺序)。

    换句话说: WHERE WEIGHT< 100 ,然后按 AGE 过滤。



$ b b

什么是更高效的取决于你拥有的数据。如果有少量记录 AGE> 50 100 第一个会更高效,否则第二个。但是,如果您使用不同的值查询,则图片可能会更改。



连接索引无法支持查询的原因是每个索引顺序都在一个轴。每个索引条目在另一个之前或之后,但从不在它旁边。所有索引条目构建一个链。



具有两个独立范围查询的查询将需要两个轴,而不是像一个链,但更像一个棋盘。一个轴为 AGE 另一个为 WEIGHT 。如果这是可能的,你的查询将只需要扫描棋盘的一个角落。



然而,一棵b树只有一个轴,因此你必须选择哪个标准首先使用。如果选择 AGE ,则意味着从 AGE 50 开始,整个链将被扫描,直到结束。只有存储在链的一侧的一些记录也将有资格 WEIGHT < 100 ,其他记录必须被读取,但会被丢弃。



所以,一个长的故事来解释为什么一棵树不能支持查询具有两个独立范围子句。另一方面,一个连接索引可以很好地进行以下操作:

  WHERE age = 50 AND weight< 100 
WHERE weight = 100 AND age> 50
WHERE age> 50 AND age< 70;

但是,如果在两个不同的列上使用两个不等式运算符, >

那么,该怎么办?



第三种可能的方法是在两列上有两个独立的索引。这允许有你喜欢的轴数(只是创建更多的索引)。但是,有一些巨大问题。首先,并不是所有的DB产品都支持。每当支持它,它是一个相当广泛的操作。它通常以每个索引被扫描的方式工作,为每个结果构建一个位图索引。这些位图索引然后被连接以应用 AND 运算符。这是很多数据ing - 只有值得的努力,如果每个条件不是很有选择性,因为它是自己的,但两者一起是非常有选择性。



Wan't我的建议?



如果您的查询在OLTP环境中运行:使用一个连接索引。
两个独立的索引是最后的选择。



ps:
索引 AGE 是一个练习 - 特别是因为存储 AGE 是一个坏实践中,您应该存储出生日期。


My question is that How database stores data and how it performs query internally.

Suppose we have following fields in our table:

  1. ID
  2. Name
  3. Age
  4. Weight
  5. Manager

and we query select * from Table1 where age>50 and weight<100

I am just curious that how it perform query internally.

What will the Node of B-Tre/B+Tree contains in this example?

解决方案

The example you have chosen is one of the few cases where a single Tree can't do the job (two independent ranges).

However, the first chapter of my work-in-progress e-Book explains the inner workings of B-Tree indexes: http://use-the-index-luke.com/anatomy/

EDIT for more details why two indexes might be useful for the above example.

The above query can be supported by three possible index configurations:

  1. concatenated index on AGE and then WEIGHT (in this order).
    In case, the query would read all records WHERE AGE > 50 and then filter by WEIGHT.

  2. concatenated index on WEIGHT and then AGE (the other order).
    That goes the different way: read all records WHERE WEIGHT < 100 and then filter by AGE.

Whatever is more efficient depends on the data you have. If there are less records AGE > 50 than WEIGHT < 100 the first will be more efficient, otherwise the second. However, if you query with different values, the picture might change.

The reason that a concatenated index can't support the query well is that each index order is on one axis only. each index entry is before or after another one, but never next to it. All index entries build one chain.

A query that has two independent range queries would require two axes, not like a chain, but more like a chess board. one axis for AGE the other for WEIGHT. If that would be possible, your query would need to scan only one corner of the chess board.

However, a b-tree has only one axis, hence you must chose which criteria to use first. If you chose AGE it means that starting with AGE 50, the entire chain will be scanned until the end. Only some of the records stored at the side of the chain will also qualify for WEIGHT < 100, the other records must be read but will be discarded.

So, a long story to explain why one tree can not support a query with two independent range clauses. On the other hand, one concatenated index can do the following quite well:

WHERE age = 50 AND weight < 100
WHERE weight = 100 AND age > 50
WHERE age > 50 AND age < 70;

However, the problem arises if there are two inequality operators are used on two different columns.

So, what to do?

The third possible approach is to have two independent indexes on the two columns. That allows to have as many axes as you like (just create more indexes). However, there are a few huge problems with that. First of all, not all DB products support that. Whenever it is supported, it is a rather expansive operation. It works typically that way that each index is scanned, a bitmap index is built for each result. Those bitmap indexes are then joined to apply the AND operator. That's a lot of data munging--it is only worth the effort if each condition is not very selective for it's own, but both together are very selective.

Wan't my advice?

If your query runs in an OLTP environment: use one concatenated index. Two independent indexes are an option of last resort only. However, if you are working in an OLAP environment, you might anyways need bitmap indexes.

ps.: Indexing AGE was an exercise in my book (with solution)--especially because storing AGE is a bad practice, you should store the date of birth instead.

这篇关于数据库如何在B树/ B +树中内部存储数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆