创建 & 后 MySQL 性能提升下降指数 [英] MySQL performance boost after create & drop index

查看:67
本文介绍了创建 & 后 MySQL 性能提升下降指数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含大约 400 万行的大型 MySQL 和 MyISAM 表,运行在 core 2 duo、8G RAM 笔记本电脑上.

I have a large MySQL, MyISAM table of around 4 million rows running in a core 2 duo, 8G RAM laptop.

该表有 30 列,包括 varchar、decimal 和 int 类型.

This table has 30 columns including varchar, decimal and int types.

我在 varchar(16) 上有一个索引.我们将此列称为:indexed_varchar_column".

I have an index on a varchar(16). Let's call this column: "indexed_varchar_column".

我的查询是

SELECT 9 columns FROM the_table WHERE indexed_varchar_column = 'something';

对于我查询的每个东西",它总是返回大约 5000 行.

It always returns around 5000 rows for every 'something' I query against.

对查询的解释返回:

+----+-------------+-------------+------+----------------------------------------------------+--------------------------------------------+---------+-------+------+-------------+
| id | select_type | table       | type | possible_keys                                      | key                                        | key_len | ref   | rows | Extra       |
+----+-------------+-------------+------+----------------------------------------------------+--------------------------------------------+---------+-------+------+-------------+
|  1 | SIMPLE      | the_table   | ref  | many indexes including indexed_varchar_column      | another_index NOT: indexed_varchar_column! | 19      | const | 5247 | Using where |
+----+-------------+-------------+------+----------------------------------------------------+--------------------------------------------+---------+-------+------+-------------+

第一件事是我不确定为什么选择 another_index.事实上,它选择了一个索引,它是 indexed_varchar_column 和另外 2 列(构成所选列的一部分)的复合索引.也许这是有道理的,因为不必读取查询中的 2 个列可能会使事情变得更快.真正的问题是以下问题:

First thing is I'm not sure is why another_index is chosen. In fact it chooses an index which is a composite index of indexed_varchar_column and another 2 columns (which form part of the selected ones). Perhaps this makes sense since it may make things a bit faster for not having to read 2 of the columns in the query. The real QUESTION is the following one:

对于我匹配的每个东西",查询需要 5 秒.第二次我查询某事"需要 0.15 秒(我猜是因为查询被缓存).当我针对 'something_new' 运行另一个查询时,它又需要 5 秒.所以,它是一致的.

The query takes 5 seconds for every 'something' I match. On the 2nd time I query against 'something' it takes 0.15 secs (I guess because the query is being cached). When I run another query against 'something_new' it takes again 5 seconds. So, it is consistent.

问题是:我发现创建一个索引(另一个复合索引,包括我的 indexed_varchar_column)并再次删除它会产生所有针对新something_other"的进一步查询只需要 0.15 秒.请注意 1) 我创建了一个索引 2) 再次删除它.所以一切都处于相同的状态.

THE PROBLEM IS: I discovered that creating an index (another composite index including my indexed_varchar_column) and dropping it again produces that all further queries against new 'something_other' take only 0.15 secs. Please note that 1) I create an index 2) drop it again. So everything is in the same state.

我猜构建和删除索引所需的所有操作都会使 SQL 引擎缓存一些然后重用的内容.在所有这些之后,当我对查询运行 EXPLAIN 时,我得到的结果与以前完全相同.

I guess all the operations needed for building and dropping indices make the SQL engine to cache something that is then reused. When I run EXPLAIN on a query after all this I get exactly the same as before.

如何继续了解创建-删除索引过程中缓存的内容,以便我可以在不操作索引的情况下缓存它?

How can I proceed to understand what is cached in the create-drop index procedure so that I can cache it without manipulating indices?

更新:

遵循 Marc B 的评论,该评论表明当 mySQL 创建索引时,它会在内部执行 SELECT ... 我尝试了以下操作:

Following a comment from Marc B that suggested that when mySQL creates an index it internally does a SELECT... I tried the following:

SELECT * FROM my_table;

花了 30 秒,返回了 400 万行.好消息是所有进一步的查询再次非常快(直到我重新启动系统).请注意,重新启动后查询再次变慢.我猜这是因为 mySQL 使用了某种操作系统缓存.

It took 30 secs and returned 4 million rows. The good thing is that all further queries are very fast again (until I reboot the system). Please note that after rebooting the queries are slow again. I guess this is because mySQL is using some sort of OS caching.

有什么想法吗?如何显式缓存我猜的表?

Any idea? How can I explicitly cache the table I guess?

更新 2:或许我应该提到这张表可能严重支离破碎.这是 400 万行,但我会定期删除很多旧字段.我也添加新的.因为我每天都有很大的 ID 间隔(对于删除的行),所以我删除主索引 (ID) 并使用连续数字再次创建它.该表可能会非常碎片化,因此 IO 一定是一个问题...不知道该怎么办.

UPDATE 2: Perhaps I should have mentioned that this table may be severely fragmented. It's 4 million rows but I remove lots of old fields regularly. I also add new ones. Since I had large gaps in IDs (for the rows deleted) every day I drop the primary index (ID) and create it again with consecutive numbers. The table may be then very fragmented and therefore IO must be an issue... Not sure what to do.

推荐答案

感谢大家的帮助.

最后我发现(感谢 Marc B 的提示)我的表在多次插入和删除后严重碎片化.几个小时前我用这个信息更新了这个问题.有两件事有帮助:

Finally I discovered (thanks to the hint of Marc B) that my table was severely fragmented after many INSERTs and DELETEs. I updated the question with this info some hours ago. There are two things that help:

1)

ALTER TABLE my_table ORDER BY indexed_varchar_column;

2) 运行:

myisamchk --sort-records=4 my_table.MYI  (where 4 corresponds to my index)

我相信这两个命令是等效的.即使在系统重新启动后,查询也很快.我已将此 A​​LTER TABLE ORDER BY 命令放在每天运行的 cron 上.这需要 2 分钟,但值得.

I believe both commands are equivalent. Queries are fast even after a system reboot. I've put this ALTER TABLE ORDER BY command on a cron that is run everyday. It takes 2 minutes but it's worth it.

这篇关于创建 & 后 MySQL 性能提升下降指数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆