数据库在内部如何工作? [英] How do databases work internally?

查看:19
本文介绍了数据库在内部如何工作?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

过去几年我一直在使用数据库,我想我已经相当有能力使用它们了.然而,我最近正在阅读乔尔的 抽象抽象法则,我意识到即使我可以编写一个查询以从数据库中获取几乎所有我想要的东西,我不知道数据库实际上是如何解释查询的.有没有人知道解释数据库内部如何工作的好文章或书籍?

I've been working with databases for the last few years and I'd like to think that I've gotten fairly competent with using them. However I was reading recently about Joel's Law of Leaky Abstractions and I realised that even though I can write a query to get pretty much anything I want out of a database, I have no idea how the database actually interprets the query. Does anyone know of any good articles or books that explain how databases work internally?

我感兴趣的一些具体事情是:

Some specific things I'm interested in are:

  • 数据库实际上做了什么来找出与 select 语句匹配的内容?
  • 数据库如何以不同的方式解释连接与具有多个where key1 = key2"语句的查询?
  • 数据库如何存储其所有内存?
  • 索引是如何存储的?

推荐答案

数据库实际上做了什么找出与选择匹配的内容声明?

What does a database actually do to find out what matches a select statement?

坦率地说,这是一个蛮力问题.简单地说,它读取数据库中的每个候选记录并将表达式与字段匹配.因此,如果您有select * from table where name = 'fred'",它会逐条遍历每条记录,抓取name"字段,并将其与 'fred' 进行比较.

To be blunt, it's a matter of brute force. Simply, it reads through each candidate record in the database and matches the expression to the fields. So, if you have "select * from table where name = 'fred'", it literally runs through each record, grabs the "name" field, and compares it to 'fred'.

现在,如果table.name"字段被索引,那么数据库将(可能但不一定)首先使用索引来定位要应用实际过滤器的候选记录.

Now, if the "table.name" field is indexed, then the database will (likely, but not necessarily) use the index first to locate the candidate records to apply the actual filter to.

这减少了要应用表达式的候选记录的数量,否则它只会执行我们所谓的表扫描",即读取每一行.

This reduces the number of candidate records to apply the expression to, otherwise it will just do what we call a "table scan", i.e. read every row.

但从根本上说,无论它如何定位候选记录,都与它如何应用实际过滤器表达式是分开的,显然,可以进行一些巧妙的优化.

But fundamentally, however it locates the candidate records is separate from how it applies the actual filter expression, and, obviously, there are some clever optimizations that can be done.

数据库如何解释连接与具有多个的查询不同where key1 = key2"语句?

How does a database interpret a join differently to a query with several "where key1 = key2" statements?

好吧,连接用于创建一个新的伪表",在该表上应用过滤器.因此,您有过滤条件和连接条件.连接标准用于构建此伪表",然后对其应用过滤器.现在,在解释连接时,它又是与过滤器相同的问题——蛮力比较和索引读取以构建伪表"的子集.

Well, a join is used to make a new "pseudo table", upon which the filter is applied. So, you have the filter criteria and the join criteria. The join criteria is used to build this "pseudo table" and then the filter is applied against that. Now, when interpreting the join, it's again the same issue as the filter -- brute force comparisons and index reads to build the subset for the "pseudo table".

数据库是如何存储所有的内存?

How does the database store all its memory?

好的数据库的关键之一是它如何管理其 I/O 缓冲区.但它基本上将 RAM 块与磁盘块匹配.使用现代虚拟内存管理器,更简单的数据库几乎可以依赖 VM 作为其内存缓冲区管理器.高端 DB 自己做这一切.

One of the keys to good database is how it manages its I/O buffers. But it basically matches RAM blocks to disk blocks. With the modern virtual memory managers, a simpler database can almost rely on the VM as its memory buffer manager. The high end DB'S do all this themselves.

索引是如何存储的?

B+Trees 通常,您应该查找它.这是一种已经存在多年的直接技术.它的好处是与大多数平衡树共享的:对节点的一致访问,加上所有叶节点都是链接的,因此您可以轻松地按关键顺序从一个节点遍历到另一个节点.因此,通过索引,可以将行视为针对数据库中的特定字段排序",并且数据库可以利用该信息来优化优化.这与使用哈希表作为索引不同,哈希表只能让您快速获取特定记录.在 B 树中,您不仅可以快速获取特定记录,还可以快速获取排序列表中的某个点.

B+Trees typically, you should look it up. It's a straight forward technique that has been around for years. It's benefit is shared with most any balanced tree: consistent access to the nodes, plus all the leaf nodes are linked so you can easily traverse from node to node in key order. So, with an index, the rows can be considered "sorted" for specific fields in the database, and the database can leverage that information to it benefit for optimizations. This is distinct from, say, using a hash table for an index, which only lets you get to a specific record quickly. In a B-Tree you can quickly get not just to a specific record, but to a point within a sorted list.

在数据库中存储和索引行的实际机制非常简单易懂.游戏正在管理缓冲区,并将 SQL 转换为有效的查询路径以利用这些基本存储习惯用法.

The actual mechanics of storing and indexing rows in the database are really pretty straight forward and well understood. The game is managing buffers, and converting SQL in to efficient query paths to leverage these basic storage idioms.

然后,在存储习惯用法之上还有整个多用户、锁定、日志记录和事务复杂性.

Then, there's the whole multi-users, locking, logging, and transactions complexity on top of the storage idiom.

这篇关于数据库在内部如何工作?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆