MySQL索引 - 最佳实践是什么? [英] MySQL indexes - what are the best practices?

查看:104
本文介绍了MySQL索引 - 最佳实践是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在我的MySQL数据库上使用索引一段时间但从未正确学习关于它们。通常我会在我将要搜索的任何字段上放置一个索引,或者使用 WHERE 子句进行选择,但有时它看起来不是那么黑白。

I've been using indexes on my MySQL databases for a while now but never properly learnt about them. Generally I put an index on any fields that I will be searching or selecting using a WHERE clause but sometimes it doesn't seem so black and white.

MySQL索引的最佳做法是什么?

示例情况/困境:


如果一个表有六列并且所有
都是可搜索的,那么我应该将所有这些全部索引或者没有索引吗? / p>

If a table has six columns and all of them are searchable, should I index all of them or none of them?


有什么负面表现
索引的影响?

What are the negative performance impacts of indexing?


如果我有一个VARCHAR 2500列可以从我的网站部分搜索
,那么我应该将
编入索引吗?

If I have a VARCHAR 2500 column which is searchable from parts of my site, should I index it?


推荐答案

你应该花一些时间阅读索引,有很多关于它的文章,并且了解正在发生的事情很重要。

You should definitely spend some time reading up on indexing, there's a lot written about it, and it's important to understand what's going on.

从广义上讲,索引对表的行强制排序。

Broadly speaking, and index imposes an ordering on the rows of a table.

为简单起见,想象一个表只是一个大的CSV文件。每当插入一行时,它就会在末尾插入 。所以表的自然排序就是插入行的顺序。

For simplicity's sake, imagine a table is just a big CSV file. Whenever a row is inserted, it's inserted at the end. So the "natural" ordering of the table is just the order in which rows were inserted.

想象一下,你已经在一个非常基本的电子表格应用程序中加载了这个CSV文件。所有这些电子表格都会显示数据,并按顺序对行进行编号。

Imagine you've got that CSV file loaded up in a very rudimentary spreadsheet application. All this spreadsheet does is display the data, and numbers the rows in sequential order.

现在想象一下,您需要找到所有具有某些值M的行第三栏。鉴于您的可用性,您只有一个选项。您扫描表格,检查每行的第三列的值。如果你有很多行,这种方法(表扫描)可能需要很长时间!

Now imagine that you need to find all the rows that has some value "M" in the third column. Given what you have available, you have only one option. You scan the table checking the value of the third column for each row. If you've got a lot of rows, this method (a "table scan") can take a long time!

现在想象除了这个表,你有索引。此特定索引是第三列中的值的索引。索引以一些有意义的顺序(例如,按字母顺序)列出第三列中的所有值,并且对于每个值,列出了该值出现的行号列表。

Now imagine that in addition to this table, you've got an index. This particular index is the index of values in the third column. The index lists all of the values from the third column, in some meaningful order (say, alphabetically) and for each of them, provides a list of row numbers where that value appears.

现在您有一个很好的策略来查找第三列的值为M的所有行。例如,您可以执行二进制搜索!虽然表扫描要求您查看N行(其中N是行数),但二进制搜索仅要求您查看log-n索引条目,在最坏的情况下。哇,这肯定容易多了!

Now you have a good strategy for finding all the rows where the value of the third column is "M". For instance, you can perform a binary search! Whereas the table scan requires you to look N rows (where N is the number of rows), the binary search only requires that you look at log-n index entries, in the very worst case. Wow, that's sure a lot easier!

当然,如果你有这个索引,并且你正在向表中添加行(最后,那就是我们的概念表工作),你需要每次更新索引。所以当你在写新行时你会做更多的工作,但是当你搜索某些东西时,你会节省大量的时间。

Of course, if you have this index, and you're adding rows to the table (at the end, since that's how our conceptual table works), you need to update the index each and every time. So you do a little more work while you're writing new rows, but you save a ton of time when you're searching for something.

所以,一般来说,索引在读取效率和写入效率之间进行权衡。没有索引,插入可以非常快 - 数据库引擎只是向表中添加一行。在添加索引时,引擎必须在执行插入时更新每个索引。

So, in general, indexing creates a tradeoff between read efficiency and write efficiency. With no indexes, inserts can be very fast -- the database engine just adds a row to the table. As you add indexes, the engine must update each index while performing the insert.

另一方面,读取变得快得多。

On the other hand, reads become a lot faster.

希望这涵盖了你的前两个问题(正如其他人已经回答的那样 - 你需要找到合适的余额)。

Hopefully that covers your first two questions (as others have answered -- you need to find the right balance).

你的第三个场景有点复杂。如果您使用LIKE,索引引擎通常会帮助您将读取速度提升到第一个%。换句话说,如果您正在选择WHERE列LIKE'foo%bar%',数据库将使用索引查找列以foo开头的所有行,然后需要扫描该中间行集以查找子集包含bar。 SELECT ... WHERE列LIKE'%bar%'无法使用索引。我希望你能看出原因。

Your third scenario is a little more complicated. If you're using LIKE, indexing engines will typically help with your read speed up to the first "%". In other words, if you're SELECTing WHERE column LIKE 'foo%bar%', the database will use the index to find all the rows where column starts with "foo", and then need to scan that intermediate rowset to find the subset that contains "bar". SELECT ... WHERE column LIKE '%bar%' can't use the index. I hope you can see why.

最后,你需要开始考虑多个列上的索引。这个概念是相同的,并且行为类似于LIKE的东西 - 实际上,如果你有(a,b,c)的索引,引擎将继续尽可能从左到右使用索引。因此,对列a的搜索可能会使用(a,b,c)索引,就像(a,b)上的索引一样。但是,如果你正在搜索WHERE b = 5并且c = 1,那么引擎需要进行全表扫描。

Finally, you need to start thinking about indexes on more than one column. The concept is the same, and behaves similarly to the LIKE stuff -- essentialy, if you have an index on (a,b,c), the engine will continue using the index from left to right as best it can. So a search on column a might use the (a,b,c) index, as would one on (a,b). However, the engine would need to do a full table scan if you were searching WHERE b=5 AND c=1)

希望这有助于减少一些亮点,但是我必须重申,你最好花几个小时来寻找能够深入解释这些事情的好文章。阅读特定数据库服务器的文档也是一个好主意。查询规划者实现和使用索引的方式可以有很大的不同。

Hopefully this helps shed a little light, but I must reiterate that you're best off spending a few hours digging around for good articles that explain these things in depth. It's also a good idea to read your particular database server's documentation. The way indices are implemented and used by query planners can vary pretty widely.

这篇关于MySQL索引 - 最佳实践是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆