如何优化数据库中的查询-基础知识 [英] How to Optimize Queries in a Database - The Basics

查看:48
本文介绍了如何优化数据库中的查询-基础知识的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

似乎所有与该主题有关的问题都是非常具体的,尽管我重视特定的示例,但我对SQL优化的基础很感兴趣.我对使用SQL感到很舒服,并且具有硬件/低级软件的背景.

It seems that all questions regarding this topic are very specific, and while I value specific examples, I'm interested in the basics of SQL optimization. I am very comfortable working in SQL, and have a background in hardware/low level software.

我想要的是既有形的软件工具,又是查看我定期查看的mysql数据库的方法,并且知道join语句和where语句的顺序之间的区别.

What I want is the tools both tangible software, and a method to look at the mysql databases I look at on a regular basis and know what the difference between orders of join statements and where statements.

我想知道为什么索引有帮助,确切地说是为什么.我想特别地知道发生了什么变化,并且我想知道我如何才能真正看到正在发生的事情.我不需要工具来破坏SQL的每一步,我只想能够四处摸索,如果有人不能告诉我要索引哪一列,我将可以拿出一张纸,在一段时间内就能提出答案.

I want to know why an index helps, like, exactly why. I want to know specifically what happens differently, and I want to know how I can actually look at what is happening. I don't need a tool that will breakdown every step of my SQL, I just want to be able to poke around and if someone can't tell me what column to index, I will be able to get out a sheet of paper and within some period of time be able to come up with the answers.

数据库很复杂,但是并不是那么复杂,并且必须有一些很好的资料来学习基础知识,以便您知道如何找到遇到的优化问题的答案,即使可能会找到确切的答案.在论坛上回答.

Databases are complicated, but they aren't THAT complicated, and there must be some great material out there for learning the basics so that you know how to find the answers to optimization problems you encounter, even if could hunt down the exact answer on a forum.

请推荐一些简洁明了,直观且不惧怕底层螺母和螺栓的阅读材料.我更喜欢在线免费资源,但是如果某本书推荐使钉子砸了,我会考虑接受的.

Please recommend some reading that is concise, intuitive, and not afraid to get down to the low level nuts and bolts. I prefer online free resources, but if a book recommendation demolishes the nail head it hits I'd consider accepting it.

推荐答案

您必须对每个where条件和每个join ... on条件进行查找.两者的工作原理相同.

You have to do a look up for every where condition and for every join...on condition. The two work the same.

假设我们写

select name
from customer
where customerid=37;

DBMS必须以某种方式找到一个或多个customerid = 37的记录.如果没有索引,唯一的方法就是读取表中的每条记录,将customerid与37进行比较.即使找到索引,也无法知道只有一个,因此必须继续寻找其他人.

Somehow the DBMS has to find the record or records with customerid=37. If there is no index, the only way to do this is to read every record in the table comparing the customerid to 37. Even when it finds one, it has no way of knowing there is only one, so it has to keep looking for others.

如果在customerid上创建索引,则DBMS可以非常快速地搜索索引.它不是顺序搜索,而是取决于数据库的二进制搜索或其他有效方法.没关系,接受它比顺序运行要快得多.索引然后将其直接带到适当的一条或多条记录.此外,如果您指定索引为唯一",则数据库知道只能有一个索引,因此不会浪费时间寻找第二个索引.(而且DBMS会阻止您添加第二个.)

If you create an index on customerid, the DBMS has ways to search the index very quickly. It's not a sequential search, but, depending on the database, a binary search or some other efficient method. Exactly how doesn't matter, accept that it's much faster than sequential. The index then takes it directly to the appropriate record or records. Furthermore, if you specify that the index is "unique", then the database knows that there can only be one so it doesn't waste time looking for a second. (And the DBMS will prevent you from adding a second.)

现在考虑以下查询:

select name
from customer
where city='Albany' and state='NY';

现在我们有两个条件.如果只有这些字段之一具有索引,则DBMS将使用该索引查找记录的子集,然后顺序搜索这些记录.例如,如果您有状态索引,则DBMS会迅速找到NY的第一条记录,然后顺序搜索寻找city ='Albany',并在到达纽约的最后一条记录时停止查找.

Now we have two conditions. If you have an index on only one of those fields, the DBMS will use that index to find a subset of the records, then sequentially search those. For example, if you have an index on state, the DBMS will quickly find the first record for NY, then sequentially search looking for city='Albany', and stop looking when it reaches the last record for NY.

如果您有一个包含两个字段的索引,即在客户(州,城市)上创建索引",那么DBMS可以立即放大到正确的记录.

If you have an index that includes both fields, i.e. "create index on customer (state, city)", then the DBMS can immediately zoom to the right records.

如果您有两个单独的索引,每个字段一个,那么DBMS将具有适用于决定使用哪个索引的各种规则.同样,确切的完成方式取决于您所使用的特定DBMS,但是从根本上说,它试图保持有关记录总数,不同值的数量以及值的分布的统计信息.然后它将顺序搜索那些记录,以找到满足其他条件的记录.在这种情况下,DBMS可能会观察到有比州更多的城市,因此通过使用城市索引,它可以快速缩放到"Albany"记录.然后它将顺序搜索这些,并针对"NY"检查每个状态.如果您有加利福尼亚州奥尔巴尼市的记录,这些记录将被跳过.

If you have two separate indexes, one on each field, the DBMS will have various rules that it applies to decide which index to use. Again, exactly how this is done depends on the particular DBMS you are using, but basically it tries to keep statistics on the total number of records, the number of different values, and the distribution of values. Then it will search those records sequentially for the ones that satisfy the other condition. In this case the DBMS would probably observe that there are many more cities than there are states, so by using the city index it can quickly zoom to the 'Albany' records. Then it will sequentially search these, checking the state of each against 'NY'. If you have records for Albany, California these will be skipped.

每个联接都需要某种形式的查找.

Every join requires some sort of look-up.

说我们写

select customer.name
from transaction
join customer on transaction.customerid=customer.customerid
where transaction.transactiondate='2010-07-04' and customer.type='Q';

现在,DBMS必须决定首先读取哪个表,从那里选择适当的记录,然后在另一个表中找到匹配的记录.

Now the DBMS has to decide which table to read first, select the appropriate records from there, and then find the matching records in the other table.

如果您有transaction.transactiondate和customer.customerid的索引,最好的计划可能是找到该日期的所有交易,然后为每个交易找到具有匹配的customerid的客户,然后验证客户的类型正确.

If you had an index on transaction.transactiondate and customer.customerid, the best plan would likely be to find all the transactions with this date, and then for each of those find the customer with the matching customerid, and then verify that the customer has the right type.

如果没有对customer.customerid的索引,则DBMS可以快速找到交易,但是对于每笔交易,它都必须顺序搜索customer表以寻找匹配的customerid.(这可能会很慢.)

If you don't have an index on customer.customerid, then the DBMS could quickly find the transaction, but then for each transaction it would have to sequentially search the customer table looking for a matching customerid. (This would likely be very slow.)

假设您仅有的索引位于transaction.customerid和customer.type上.然后,DBMS可能会使用完全不同的计划.它可能会扫描客户表中所有类型正确的客户,然后为每个客户找到该客户的所有交易,并依次搜索正确的日期.

Suppose instead that the only indexes you have are on transaction.customerid and customer.type. Then the DBMS would likely use a completely different plan. It would probably scan the customer table for all customers with the correct type, then for each of these find all transactions for this customer, and sequentially search them for the right date.

最重要的优化关键是弄清楚哪些索引真正有帮助并创建这些索引.额外的,未使用的索引是数据库的负担,因为维护它们需要花费很多时间,如果不使用它们,这是浪费时间.

The most important key to optimization is to figure out what indexes will really help and create those indexes. Extra, unused indexes are a burden on the database because it takes work to maintain them, and if they're never used this is wasted effort.

您可以使用EXPLAIN命令告诉DBMS将对任何给定查询使用哪些索引.我一直在使用它来确定我的查询是否得到了很好的优化,或者是否应该创建其他索引.(请阅读此命令的文档以获取有关其输出的说明.)

You can tell what indexes the DBMS will use for any given query with the EXPLAIN command. I use this all the time to determine if my queries are being optimized well or if I should be creating additional indexes. (Read the documentation on this command for an explanation of its output.)

注意:请记住,我说过DBMS会保留每个表中记录数量和不同值的数量等的统计信息.如果数据已更改,那么EXPLAIN今天给您的计划可能会与昨天给您的计划完全不同.例如,如果您有一个将两个表连接起来的查询,而这些表中的一个很小,而另一个很大,那么它将偏向于先读取小表,然后在大表中查找匹配的记录.将记录添加到表中可能会发生较大的更改,从而导致DBMS更改其计划.因此,您应该尝试对具有实际数据的数据库进行解释.在每个表中有5条记录的测试数据库上运行的价值要比在实时数据库上运行的价值要低得多.

Caveat: Remember that I said that the DBMS keeps statistics on the number of records and the number of different values and so on in each table. EXPLAIN may give you a completely different plan today than it gave yesterday if the data has changed. For example, if you have a query that joins two tables and one of these tables is very small while the other is large, it will be biased toward reading the small table first and then finding matching records in the large table. Adding records to a table can change which is larger, and thus lead the DBMS to change its plan. Thus, you should attempt to do EXPLAINS against a database with realistic data. Running against a test database with 5 records in each table is of far less value than running against a live database.

还有很多可以说的,但是我不想在这里写一本书.

Well, there's much more that could be said, but I don't want to write a book here.

这篇关于如何优化数据库中的查询-基础知识的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆