何时在SQL表字段(MySQL)上添加索引? [英] When to add an index on a SQL table field (MySQL)?

查看:155
本文介绍了何时在SQL表字段(MySQL)上添加索引?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我被告知如果你知道你经常会使用一个字段进行连接,那么在它上面创建索引可能会很好。

I've been told that if you know you will be frequently using a field for joins, it may be good to create an index on it.

我通常理解索引表格的概念(很像纸质书中的索引允许您查找特定术语而无需逐页搜索)。但我不清楚何时使用它们。

I generally understand the concept of indexing a table (much like an index in a paper book allows you to look up a particular term without having to search page by page). But I'm less clear about when to use them.

假设我有3个表:USERS,COMMENTS和VOTES表。我想创建一个类似Stackoverflow的评论线程,其中查询返回评论以及这些评论的上/下投票数。

Let's say I have 3 tables: a USERS, COMMENTS, and a VOTES table. And I want to make a Stackoverflow-like commenting thread where the query returns comments as well as the numbers of up/down votes on those comments.

USERS table
user_id user_name   
 1         tim
 2         sue
 3         bill 
 4         karen
 5         ed

COMMENTS table
comment_id topic_id    comment   commenter_id
 1            1       good job!         1
 2            2       nice work         2
 3            1       bad job :)        3

VOTES table
 vote_id    vote  comment_id  voter_id
  1          -1       1          5
  2           1       1          4
  3           1       3          1
  4          -1       2          5
  5           1       2          4

这是查询和SQLFiddle返回topic_id的投票= 1

select u.user_id, u.user_name,
   c.comment_id, c.topic_id, c.comment,
   count(v.vote) as totals, sum(v.vote > 0) as yes, sum(v.vote < 0) as no,
   my_votes.vote as did_i_vote
from comments c
join users u on u.user_id = c.commenter_id
left join votes v on v.comment_id = c.comment_id
left join votes my_votes on my_votes.comment_id = c.comment_id
and my_votes.voter_id = 1
where c.topic_id = 1
group by c.comment_id, u.user_name, c.comment_id, c.topic_id, did_i_vote;

让我们假设评论和投票的数量达到数百万。为了加快查询速度,我的问题是我应该在 comments.commenter_id votes.voter_id votes.comment_id

Let's assume the number of comments and votes goes in to the millions. To speed up the query, my question is should I put an index on comments.commenter_id, votes.voter_id and votes.comment_id?

推荐答案

这是一个更新,其中包含一些使用过的密钥 http://www.sqlfiddle.com/#!2/94daa/1

Here's an update with some keys that get used http://www.sqlfiddle.com/#!2/94daa/1

引擎必须将使用索引的成本与不这样做的成本进行比较。您会注意到我必须添加更多行以获取所使用的索引。

The engine has to compare the cost of using an index with the cost of not doing so. You'll notice I've had to add some more rows in to get the indexes used.

使用索引,引擎必须使用索引来获取匹配值,这很快。然后它必须使用匹配来查找表中的实际行。如果索引没有缩小行数,那么只需查找表中的所有行就可以更快。

With an index, the engine has to use the index to get matching values, which is fast. Then it has to use the matches to look up the actual rows in the table. If the index doesn't narrow down the number of rows, it can be faster to just look up all the rows in the table.

我不确定mysql是否有类似于SQL Server聚簇索引的东西。在这种情况下,索引和表数据具有相同的结构,因此您没有索引查找的第二步。

I'm not sure if mysql has something similar to SQL Server clustered indexes. In this case the index and table data are in the same structure, so you don't have the second step of the index lookup.

我以两种不同的方式引入索引,首先在用户表上定义主键。这将隐式在user_id列上创建唯一索引。唯一索引意味着您不能两次插入相同的值集。对于单列索引,这只意味着您不能两次使用相同的值。

I introduced indexes in two different ways, firstly on the users table by defining a primary key. This will implicitly create a unique index on the user_id column. A unique index means if you cannot insert the same set of values twice. For a single column index, this just means you can't have the same value twice.

如果你想象一本桌面的用户书,每页有一个用户,然后创建的索引为您提供了user_id的排序列表,每个列表都包含用户的页码。该列表通常以某种树形式存储,以便快速查找特定数字。想想你在电话簿中查找名字的方式,你不仅要扫描所有页面,直到找到它,你猜测它会在哪里,然后跳过或转发大块的页面直到你接近。您通常可以在O(log 2 n)时间内查找索引中的值,其中n是行数,您需要读取相似数量的索引页。

If you imagine a book of users for the table, with one user per page, then the index created gives you a sorted list of user_id, each with the page number of the user. The list is usually stored in some kind of tree form to make looking up a particular number fast. Think about the way you look up a name in a phone book, you don't just scan all the pages until you find it, you make a guess where it will be, and then skip back or forward chunks of pages until you get close. You can normally look up values in an index in O(log2 n) time, where n is the number of rows, and you need to read a similar number of index pages.

现在,如果数据库引擎被赋予查询 select * from user user_id = 3 ,它有两个选择。它可以读取每个数据页面,并查找正确的值(它可能会使用主键在第一个时停止的事实)。另一种方法是读取索引以获取正确的数据页,然后查找数据页。

Now if the DB engine is given the query select * from users Where user_id = 3, it has two choices. it can read each data page, and look for the right value (it might use the fact there is a primary key to stop at the first). The alternative is to read the index to get the right data page, and then look up the data page.

为了具体和简单,假设该表有1024个条目。假设每个条目都占用一个数据页面。假设索引树中的每个条目都占用一个索引页。假设索引是平衡的,因此它有10个级别,总共2047个页面。 (所有这些假设都是可疑的,但是他们得到了相应的点,特别是索引页几乎总是小于数据页,因为你不倾向于一次索引所有列。)

For concreteness and simplicity, assume the table has 1024 entries. Assume each entry takes one data page. Assume each entry in the index tree takes one index page. Assume the index is balanced, so it has 10 levels, and a total of 2047 pages. (all these assumptions are suspect, but they get the point accross, in particular index pages are almost always smaller than data pages, as you don't tend to index all columns at once).

要执行表扫描方法,需要读取1024个数据页。要使用索引,需要读取10个索引页和一个数据页。几乎所有数据库性能都与最小化读取页数有关。

To do the table scan approach will required reading 1024 data pages. To use the index will required reading 10 index pages and one data page. Almost all database performance is about minimising the amount of pages read.

多列索引允许快速查找数据集。如果你有一个索引(col1,col2),即使只是匹配col1也会得到改善。

Multi column indexes allow looking up sets of data quickly. If you have an index with (col1, col2), even just matching on col1 is improved.

创建索引语句只是说明了哪些列被索引,以及是否允许重复值。

The create index statement just says what columns are indexed, and whether or not duplicate values are allowed.

再次使用本书类比,在投票时创建索引ix_comment_id (comment_id,voter_id)将创建一个有序的comment_id列表,然后是voter_id,并引用相应的数据行。

Using the book analogy again, Create Index ix_comment_id on votes (comment_id, voter_id) will create an ordered list of comment_id then voter_id with the reference to the corresponding data row.

+------------+--------------+---------+
| comment_id | reference_id | row_ref |
+------------+--------------+---------+
|          1 |            4 |    ref1 |
|          1 |            5 |    ref2 |
|          2 |            4 |    ref3 |
|          2 |            5 |    ref4 |
|          3 |            1 |    ref5 |
+------------+--------------+---------+

这篇关于何时在SQL表字段(MySQL)上添加索引?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆