为什么MySQL不使用JOIN上的主键加上ORDER? [英] Why doesn't MySQL use the primary key on JOIN plus ORDER?

查看:176
本文介绍了为什么MySQL不使用JOIN上的主键加上ORDER?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是一个适合你的(MySQL显然):

Here's a neat one for you (MySQL, obviously):


# Setting things up
DROP DATABASE IF EXISTS index_test_gutza;
CREATE DATABASE index_test_gutza;
USE index_test_gutza;

CREATE TABLE customer_order (
    id MEDIUMINT UNSIGNED NOT NULL AUTO_INCREMENT,
    invoice MEDIUMINT UNSIGNED NOT NULL DEFAULT 0,
    PRIMARY KEY(id)
);
INSERT INTO customer_order
    (id, invoice)
    VALUES
    (1, 1),
    (2, 2),
    (3, 3),
    (4, 4),
    (5, 5);

CREATE TABLE customer_invoice (
    id MEDIUMINT UNSIGNED NOT NULL AUTO_INCREMENT,
    invoice_no MEDIUMINT UNSIGNED DEFAULT NULL,
    invoice_pdf LONGBLOB,
    PRIMARY KEY(id)
);
INSERT INTO customer_invoice
    (id, invoice_no)
    VALUES
    (1, 1),
    (2, 2),
    (3, 3),
    (4, 4),
    (5, 5);

# Ok, here's the beef
EXPLAIN
    SELECT co.id
    FROM customer_order AS co;

EXPLAIN
    SELECT co.id
    FROM customer_order AS co
    ORDER BY co.id;

EXPLAIN
    SELECT co.id, ci.invoice_no
    FROM customer_order AS co
    LEFT JOIN customer_invoice AS ci ON ci.id=co.invoice;

EXPLAIN
    SELECT co.id, ci.invoice_no
    FROM customer_order AS co
    LEFT JOIN customer_invoice AS ci ON ci.id=co.invoice
    ORDER BY co.id;

底部有四个EXPLAIN语句。前两个产生了你期望的结果:

There are four EXPLAIN statements at the bottom. The first two result in exactly what you'd expect:


+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
| id | select_type | table | type  | possible_keys | key     | key_len | ref  | rows | Extra       |
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
|  1 | SIMPLE      | co    | index | NULL          | PRIMARY | 3       | NULL |    5 | Using index |
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+

第三个已经很有趣了 - 请注意customer_order中的主键是如何不再使用的:

The third one is already interesting -- notice how the primary key in customer_order is not used any more:


+----+-------------+-------+--------+---------------+---------+---------+-----------------------------+------+-------------+
| id | select_type | table | type   | possible_keys | key     | key_len | ref                         | rows | Extra       |
+----+-------------+-------+--------+---------------+---------+---------+-----------------------------+------+-------------+
|  1 | SIMPLE      | co    | ALL    | NULL          | NULL    | NULL    | NULL                        |    5 |             |
|  1 | SIMPLE      | ci    | eq_ref | PRIMARY       | PRIMARY | 3       | index_test_gutza.co.invoice |    1 | Using index |
+----+-------------+-------+--------+---------------+---------+---------+-----------------------------+------+-------------+

然而,第四个是zinger - 只需在主键上添加ORDER BY 就会导致customer_order上的filesort(这是预期的,因为它已经在上面被迷惑了):

The fourth one, however, is the zinger -- simply adding the ORDER BY on the primary key leads to filesort on customer_order (which is to be expected, given that it was already bewildered above):


+----+-------------+-------+--------+---------------+---------+---------+-----------------------------+------+----------------+
| id | select_type | table | type   | possible_keys | key     | key_len | ref                         | rows | Extra          |
+----+-------------+-------+--------+---------------+---------+---------+-----------------------------+------+----------------+
|  1 | SIMPLE      | co    | ALL    | NULL          | NULL    | NULL    | NULL                        |    5 | Using filesort |
|  1 | SIMPLE      | ci    | eq_ref | PRIMARY       | PRIMARY | 3       | index_test_gutza.co.invoice |    1 | Using index    |
+----+-------------+-------+--------+---------------+---------+---------+-----------------------------+------+----------------+

文件排序!而且我从不使用除customer_order表中的主键以外的任何内容进行排序,以及在customer_invoice表中使用JOIN的主键。那么,在所有好的和正确的名义中,为什么它会突然切换到filesort?!更重要的是,我该如何避免这种情况?为了记录,我很乐意接受一个记录在案的答案,解释为什么不能可以避免(如果是这样的话。)

Filesort! And that while I'm never using anything except the primary key in the customer_order table for ordering, and the primary key in the customer_invoice table for the JOIN. So then, in the name of all that is good and right, why does it switch to filesort all of a sudden?! And more importantly, how do I avoid this? For the record, I will gladly accept a documented answer explaining why this cannot be avoided (if that's the case.)

正如你现在可能怀疑的那样,这实际上是在生产中发生的,虽然表格绝不是很大(只有数百条记录),但发票表上的文件输出(包含一个PDF文件)当我运行类似于上面的查询时(我需要知道哪些订单已经发出发票,哪些不是),杀死了服务器。

As you probably suspect by now, this is actually happening in production, and although the tables are by no means huge (only hundreds of records), the filesort on the invoice table (which contains a PDF file) is killing the server when I run queries similar to the one above (which I need in order to know which orders have been issued invoices, and which weren't).

在你问之前,我设计了数据库,我认为我可以安全地将PDF文件存储在该表中,因为我永远不会需要任何搜索查询 - 我总是有它的主键在手边!

Before you ask, I designed the database, and I thought I was safe storing the PDF files in that table because I never ever need any search queries on it -- I always have its primary key at hand!

以下是以下评论中建议内容的概要,所以你不必阅读所有这些:

Here's a synopsis of what was suggested in the comments below, so you don't have to read all of that:


  • *你应该在customer_order.invoice *上添加一个键 - 我实际上在生产中尝试过,它没有区别(因为它不应该)

  • 你应该使用 USE INDEX - 尝试过,没用。我也试过 FORCE INDEX - 没有结果(没有任何改变)

  • 你过度简化了用例,我们需要实际的生产查询 - 我可能在第一次迭代中剥离了太多,所以我更新了它(我刚刚添加了,ci.invoice_no in最后几次查询的 SELECT 。为了记录,如果有人真的很好奇,这里是生产查询,完全一样(这将检索订单的最后一页):

  • *You should add a key on customer_order.invoice* -- I actually tried that in production, it makes no difference (as it shouldn't)
  • You should use USE INDEX -- tried that, didn't work. I also tried FORCE INDEX -- no result either (no change whatsoever)
  • You oversimplified the use case, we need the actual production query -- I might have stripped it a bit too much in the first iteration, so I updated it (I just added , ci.invoice_no in the SELECT for the last couple of queries). For the record, if anybody's really curious, here's the production query, exactly as it is (this retrieves the last page of orders):

SELECT
    corder.id,
    corder.public_id,
    CONCAT(buyer.fname," ",buyer.lname) AS buyer_name,
    corder.status,
    corder.payment,
    corder.reserved AS R,
    corder.tracking_id!="" as A,
    corder.payment_received as pay_date,
    invoice.invoice_no AS inv,
    invoice.receipt_no AS rec,
    invoice.public AS pub_inv,
    proforma.proforma_no AS prof,
    proforma.public AS pub_pf,
    corder.rating,
    corder.rating_comments!="" AS got_comment
FROM
    corder
LEFT JOIN user as buyer ON buyer.id=corder.buyer
LEFT JOIN invoice as invoice ON invoice.id=corder.invoice
LEFT JOIN invoice as proforma ON proforma.id=corder.proforma
ORDER BY
    id DESC 
LIMIT 400, 20;

上面的查询(这也是我在生产中运行的那个)大约需要14秒才能运行。这是在生产中执行的简化查询,如上面的用例所示:

The query above (which, again, is exactly what I run in production) takes about 14 seconds to run. Here's the simplified query, as shown in the use case above, executed on production:


SELECT
    corder.id,
    invoice.invoice_no
FROM
    corder
LEFT JOIN invoice ON invoice.id=corder.invoice
ORDER BY
    corder.id DESC 
LIMIT 400, 20;

这个需要13秒才能运行。请注意,只要我们谈论结果的最后一页(我们是),LIMIT就没有任何区别。也就是说,在涉及filesort时,检索最后12个结果或所有412个结果之间绝对没有显着差异。

This one takes 13 seconds to run. Be advised the LIMIT makes no difference whatsoever as long as we're talking about the last page of results (which we are). That is, there's absolutely no significant difference between retrieving the last 12 results or all 412 results when filesort is involved.

ypercube的答案不仅正确,而且遗憾的是它似乎是唯一合法的答案。我试图进一步将条件与字段分开,因为 SELECT * FROM corder 子查询最终可能涉及大量数据,如果corder本身包含LONGBLOB(并复制来自的字段)子查询中的主要查询是不优雅的,但不幸的是它似乎不起作用:

ypercube's answer is not only correct, but unfortunately it seems to be the only legitimate one. I tried to further separate conditions from fields, because the SELECT * FROM corder subquery can end up involving a lot of data, if corder itself contains LONGBLOBs (and duplicating the fields from the main query in the subquery is inelegant), but unfortunately it doesn't seem to work:


SELECT
    corder.id,
    corder.public_id,
    CONCAT(buyer.fname," ",buyer.lname) AS buyer_name,
    corder.status,
    corder.payment,
    corder.reserved AS R,
    corder.tracking_id != "" AS A,
    corder.payment_received AS pay_date,
    invoice.invoice_no AS inv,
    invoice.receipt_no AS rec,
    invoice.public AS pub_inv,
    proforma.proforma_no AS prof,
    proforma.public AS pub_pf,
    corder.rating,
    corder.rating_comments!="" AS got_comment
FROM
    corder
LEFT JOIN user as buyer ON buyer.id = corder.buyer
LEFT JOIN invoice AS invoice ON invoice.id = corder.invoice
LEFT JOIN invoice AS proforma ON proforma.id = corder.proforma
WHERE corder.id IN (
    SELECT id
    FROM corder
    ORDER BY id DESC
    LIMIT 400,20
)
ORDER BY
    corder.id DESC;

此操作失败,并显示以下错误消息:

This fails, with the following error message:


ERROR 1235 (42000): This version of MySQL doesn't yet support 'LIMIT & IN/ALL/ANY/SOME subquery'



<我正在使用MySQL 5.1.61,这在5.1家族中是最近才出现的(显然5.5x中也不支持)。

I'm using MySQL 5.1.61, which is reasonably recent in the 5.1 family (and apparently this is not supported in 5.5.x either).

推荐答案

你能试试这个版本吗(它基本上先得到 corder 表的420行,保留其中的20行然后再做3个外表加入):

Can you try this version (it basically gets first the 420 rows of the corder table, keeps the 20 of them and then does the 3 outer joins):

SELECT
    corder.id,
    corder.public_id,
    CONCAT(buyer.fname," ",buyer.lname) AS buyer_name,
    corder.status,
    corder.payment,
    corder.reserved AS R,
    corder.tracking_id != "" AS A,
    corder.payment_received AS pay_date,
    invoice.invoice_no AS inv,
    invoice.receipt_no AS rec,
    invoice.public AS pub_inv,
    proforma.proforma_no AS prof,
    proforma.public AS pub_pf,
    corder.rating,
    corder.rating_comments!="" AS got_comment
FROM
    ( SELECT * 
      FROM corder
      ORDER BY
        id DESC 
      LIMIT 400, 20
    )
    AS corder
LEFT JOIN user as buyer ON buyer.id = corder.buyer
LEFT JOIN invoice AS invoice ON invoice.id = corder.invoice
LEFT JOIN invoice AS proforma ON proforma.id = corder.proforma
ORDER BY
    corder.id DESC ;

这篇关于为什么MySQL不使用JOIN上的主键加上ORDER?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆