在MySQL查询中添加limit子句会大大降低它的速度 [英] Adding limit clause to MySQL query slows it down dramatically

查看:67
本文介绍了在MySQL查询中添加limit子句会大大降低它的速度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试解决MySQL上的性能问题,因此我想创建一个较小版本的表来使用.当我在查询中添加LIMIT子句时,它从大约2秒(对于完整插入)变为天文数字(42分钟).

I'm trying to troubleshoot a performance issue on MySQL, so I wanted to create a smaller version of a table to work with. When I add a LIMIT clause to the query, it goes from about 2 seconds (for the full insert) to astronomical (42 minutes).

mysql> select pr.player_id, max(pr.insert_date) as insert_date from player_record pr
inner join date_curr dc on pr.player_id = dc.player_id where pr.insert_date < '2012-05-15'
group by pr.player_id;
+------------+-------------+
| 1002395119 | 2012-05-14  |
...
| 1002395157 | 2012-05-14  |
| 1002395187 | 2012-05-14  |
| 1002395475 | 2012-05-14  |
+------------+-------------+
105776 rows in set (2.19 sec)

mysql> select pr.player_id, max(pr.insert_date) as insert_date from player_record pr
inner join date_curr dc on pr.player_id = dc.player_id where pr.insert_date < '2012-05-15' 
group by pr.player_id limit 1;
+------------+-------------+
| player_id  | insert_date |
+------------+-------------+
| 1000000080 | 2012-05-14  |
+------------+-------------+
1 row in set (42 min 23.26 sec)

mysql> describe player_record;
+------------------------+------------------------+------+-----+---------+-------+
| Field                  | Type                   | Null | Key | Default | Extra |
+------------------------+------------------------+------+-----+---------+-------+
| player_id              | int(10) unsigned       | NO   | PRI | NULL    |       |
| insert_date            | date                   | NO   | PRI | NULL    |       |
| xp                     | int(10) unsigned       | YES  |     | NULL    |       |
+------------------------+------------------------+------+-----+---------+-------+
17 rows in set (0.01 sec) (most columns removed)

player_record表中有2000万行,因此我正在为要比较的特定日期在内存中创建两个表.

There are 20 million rows in the player_record table, so I am creating two tables in memory for the specific dates I am looking to compare.

CREATE temporary TABLE date_curr 
(
      player_id INT UNSIGNED NOT NULL, 
      insert_date DATE,     
      PRIMARY KEY player_id (player_id, insert_date)
 ) ENGINE=MEMORY;
INSERT into date_curr 
SELECT  player_id, 
        MAX(insert_date) AS insert_date 
FROM player_record 
WHERE insert_date BETWEEN '2012-05-15' AND '2012-05-15' + INTERVAL 6 DAY
GROUP BY player_id;

CREATE TEMPORARY TABLE date_prev LIKE date_curr;
INSERT into date_prev 
SELECT pr.player_id,
       MAX(pr.insert_date) AS insert_date 
FROM  player_record pr 
INNER join date_curr dc 
      ON pr.player_id = dc.player_id 
WHERE pr.insert_date < '2012-05-15' 
GROUP BY pr.player_id limit 0,20000;

date_curr有21.6万条记录,如果我不使用限制,date_prev有105k条记录.

date_curr has 216k entries, and date_prev has 105k entries if I don't use a limit.

这些表只是该过程的一部分,用于将另一个表(5亿行)缩减为可管理的范围. date_curr包括当前星期中的player_id和insert_date,date_prev具有date_curr中存在的任何player_id的player_id和本周之前的最新insert_date.

These tables are just part of the process, used to trim down another table (500 million rows) to something manageable. date_curr includes the player_id and insert_date from the current week, and date_prev has the player_id and most recent insert_date from BEFORE the current week for any player_id present in date_curr.

这是解释输出:

mysql> explain SELECT pr.player_id, 
                      MAX(pr.insert_date) AS insert_date 
               FROM   player_record pr 
               INNER  JOIN date_curr dc 
                      ON pr.player_id = dc.player_id
               WHERE  pr.insert_date < '2012-05-15' 
               GROUP  BY pr.player_id 
               LIMIT  0,20000;                    
+----+-------------+-------+-------+---------------------+-------------+---------+------+--------+----------------------------------------------+
| id | select_type | table | type  | possible_keys       | key         | key_len | ref  | rows   | Extra                                        |
+----+-------------+-------+-------+---------------------+-------------+---------+------+--------+----------------------------------------------+
|  1 | SIMPLE      | pr    | range | PRIMARY,insert_date | insert_date | 3       | NULL     | 396828 | Using where; Using temporary; Using filesort |
|  1 | SIMPLE      | dc    | ALL   | PRIMARY             | NULL        | NULL    | NULL | 216825 | Using where; Using join buffer               |
+----+-------------+-------+-------+---------------------+-------------+---------+------+--------+----------------------------------------------+
2 rows in set (0.03 sec)

这是在具有专用于数据库的24G RAM的系统上,当前几乎处于空闲状态.这个特定的数据库是测试,因此它是完全静态的.我做了一个mysql重新启动,它仍然具有相同的行为.

This is on a system with 24G RAM dedicated to the database, and currently is pretty much idle. This specific database is the test so it is completely static. I did a mysql restart and it still has the same behavior.

这是显示所有配置文件"的输出,其中大部分时间都花在了复制到tmp表上.

Here is the 'show profile all' output, with most time being spent on copying to tmp table.

| Status               | Duration   | CPU_user   | CPU_system | Context_voluntary | Context_involuntary | Block_ops_in | Block_ops_out | Messages_sent | Messages_received | Page_faults_major | Page_faults_minor | Swaps | Source_function       | Source_file   | Source_line |
| Copying to tmp table | 999.999999 | 999.999999 |   0.383941 |            110240 |               18983 |        16160 |           448 |             0 |                 0 |                 0 |                43 |     0 | exec                  | sql_select.cc |        1976 |

推荐答案

答案很长,但我希望您能从中学到一些东西.

A bit of a long answer but I hope you can learn something from this.

因此,根据explain语句中的证据,您可以看到MySQL查询优化器可以使用两个可能的索引,如下所示:

So based on the evidence in the explain statement you can see that there was two possible indexes that the MySQL query optimizer could have used they are as follows:

possible_keys
PRIMARY,insert_date 

但是,MySQL查询优化器决定使用以下索引:

However the MySQL query optimizer decided to use the following index:

key
insert_date

在极少数情况下,MySQL查询优化器使用了错误的索引.现在有一个可能的原因.您正在使用静态开发数据库.您可能已将其从生产中恢复来进行开发.

This is a rare occasion where MySQL query optimizer used the wrong index. Now there is a probable cause for this. You are working on a static development database. You probably restored this from production to do development against.

当MySQL优化器需要决定要在查询中使用哪个索引时,它将查看所​​有可能索引的统计信息.您可以在 http://dev.mysql.com/doc/innodb-plugin/1.0/en/innodb-other-changes-statistics-estimation.html 作为入门.

When the MySQL optimizer needs to make a decision on which index to use in a query it looks at the statistics around all the possible indexes. You can read more about statistics here http://dev.mysql.com/doc/innodb-plugin/1.0/en/innodb-other-changes-statistics-estimation.html for a starter.

因此,当您进行更新,在表中插入和删除时,会更改索引统计信息. MySQL服务器可能由于静态数据而具有错误的统计信息并选择了错误的索引.但是,这只是猜测,可能是根本原因.

So when you update, insert and delete from a table you change the index statistics. It might be that the MySQL server because of the static data had the wrong statistics and chose the wrong index. This however is just a guess at this point as a possible root cause.

现在让我们深入探讨一下索引.有两个可能的索引可以使用主键索引和insert_date上的索引. MySQL使用insert_date之一.请记住,在查询执行期间,MySQL始终只能使用一个索引.让我们看一下主键索引和insert_date索引之间的区别.

Now lets dive into the indexes. There was two possible indexes to use the primary key index and the index on insert_date. MySQL used the insert_date one. Remember during a query execution MySQL can only use one index always. Lets look at the difference between the primary key index and the insert_date index.

有关主键索引(又名集群)的简单事实:

  1. 主键索引通常是一个btree结构,其中包含数据行,即它是包含日期的表.

关于二级索引的简单事实(又名非聚集):

  1. 二级索引通常是btree结构,其中包含要建立索引的数据(索引中的列)和指向数据行在主键索引上的位置的指针.

这是一个细微但很大的区别.

This is a subtle but big difference.

让我解释一下,当您读取主键索引时,您正在读取表.该表也按主索引顺序排列.因此,要找到一个值,我将搜索索引读取的数据,它是1个操作.

Let me explain when you read a primary key index you are reading the table. The table is in order of the primary index as well. Thus to find a value I would search the index read the data which is 1 operation.

当您读取二级索引时,您会搜索索引以找到指针,然后读取主键索引以基于指针查找数据.本质上,这是2个操作,因此读取二级索引的操作成本是读取主键索引的两倍.

When you read a secondary index you search the index find the pointer then read the primary key index to find the data based on the pointer. This is essentially 2 operations making the operation of reading a secondary index twice as costly as reading the primary key index.

在您的情况下,由于它选择了insert_date作为要使用的索引,因此它只是做连接而做的工作加倍.那是问题之一.

In your case since it chose the insert_date as the index to use it was doing double the work just to do the join. That is problem one.

现在,当您限制一个记录集时,它是查询的最后执行部分. MySQL必须根据ORDER BY和GROUP BY条件对整个记录集进行排序(如果尚未全部排序),然后获取所需的记录数,然后根据LIMIT BY部分将其发送回去. MySQL必须做很多工作来跟踪要发送的记录以及它在记录集中的位置等.LIMIT BY确实会降低性能,但是我怀疑可能会继续读下去.

Now when you LIMIT a recordset it is the last piece of execution of the query. MySQL has to take the entire recordset sort it (if not sorted allready) based on ORDER BY and GROUP BY conditions then take the number of records you want and send it back based on the LIMIT BY section. MySQL has to do a lot of work to keep track of records to send and where it is in the record set etc. LIMIT BY does have a performance hit but I suspect there might be a contributing factor read on.

通过player_id查看您的GROUP BY.使用的索引是insert_date. GROUP BY本质上对记录集进行排序,但是,由于它没有用于排序的索引(请记住,索引是按其中包含的列的顺序排序的).本质上,您是在player_id上询问排序/顺序,并且所使用的索引是在insert_date上排序的.

Look at your GROUP BY it is by player_id. The index that is used is insert_date. GROUP BY essentially orders your record set, however since it had no index to use for ordering (remember a index is sorted in the order of the column(s) contained in it). Essentially you were asking sort/order on player_id and the index used was sorted on insert_date.

此步骤导致了文件排序问题,该问题本质上是从读取二级索引和一级索引(记住2个操作)返回的数据中,然后必须对它们进行排序.排序通常在磁盘上进行,因为这是在内存中进行的非常昂贵的操作.因此,整个查询结果都被写入磁盘,并以极慢的速度进行排序,以使您得到满意的结果.

This step caused the filesort problem which essentially takes the data that is returned from reading the secondary index and primary index(remember the 2 operations) and then has to sort them. Sorting is normally done on disk as it is a very very expensive operation to do in memory. Thus the entire query result was written to disk and sorted painfully slow to get you your results.

通过删除insert_date索引,MySQL现在将使用主键索引,这意味着数据是有序的(ORDER BY/GROUP BY)player_id和insert_date.这将消除读取二级索引然后使用指针读取主键索引(即表)的需要,并且由于数据已经排序,因此在应用GROUP BY部分查询时MySQL几乎没有工作.

By removing the insert_date index MySQL will now use the primary key index which means the data is ordered(ORDER BY/GROUP BY) player_id and insert_date. This will eliminate the need to read the secondary index and then use the pointer to read the primary key index i.e. the table, and since the data is already sorted MySQL has very little work when applying the GROUP BY piece of the query.

现在,如果您可以在删除索引之后发布解释语句的结果,那么以下内容还是有根据的猜测,我可能可以确认我的想法.因此,通过使用错误的索引,结果将在磁盘上排序以正确应用LIMIT BY.删除LIMIT BY可以使MySQL可能在内存中排序,因为它不必应用LIMIT BY并跟踪返回的内容. LIMIT BY可能导致创建了临时表.再也很难说不看陈述之间的差异,即解释的输出.

Now the following is a bit of a educated guess again if you could post the results of the explain statement after the index was dropped I would probably be able to confirm my thinking. So by using the wrong index the results were sorted on disk to apply the LIMIT BY properly. Removing the LIMIT BY allows MySQL to probably sort in Memory as it does not have to apply the LIMIT BY and keep track of what is being returned. The LIMIT BY probably caused the temporary table to be created. Once again difficult to say without seeing the difference between the statements i.e. output of explain.

希望这可以使您更好地理解索引以及索引为何是一把双刃剑.

Hopefully this gives you a better understanding of indexes and why they are a double edged sword.

这篇关于在MySQL查询中添加limit子句会大大降低它的速度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆