什么索引可以提高JOIN和GROUP BY的性能 [英] What indexes to improve performance of JOIN and GROUP BY

查看:52
本文介绍了什么索引可以提高JOIN和GROUP BY的性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经设置了一些表并运行了一个查询.但是在我的解释中,它将在临时表中生成SQL结果(我认为这是由于GROUP BY引起的)

I have setup some tables and ran a query. However in my explain it would appear the SQL results in a temporary table being generated ( I assume this is because of the GROUP BY)

我添加了一些索引来加快查询的速度,但是想知道是否有一种方法可以停止使用临时表,是否还有其他方法可以加快使用索引的查询速度?

I have added some indexes to speed up the query but wondering if there was a way to stop the use of a temporary table and if there is any other way I can speed my query up using indexes?

CartData

CREATE TABLE `cartdata` (
    `IDCartData` INT(11) NOT NULL AUTO_INCREMENT,
    `CartOrderref` VARCHAR(25) NOT NULL DEFAULT '',
    `UserID` INT(11) NOT NULL DEFAULT '0',
    `LastUpdate` DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE 
    CURRENT_TIMESTAMP,
    `ManualContactName` VARCHAR(100) NOT NULL DEFAULT '',
    `ManualOrderConfirmationEmail` VARCHAR(100) NOT NULL DEFAULT '',
    PRIMARY KEY (`IDCartData`),
    INDEX `CartOrderref` (`CartOrderref`)
)

CartSplitData

    CREATE TABLE `cartsplitdata` (
        `IDCartSupplierData` INT(11) NOT NULL AUTO_INCREMENT,
        `IDCartData` INT(11) NOT NULL DEFAULT '0',
        `supplierid` INT(11) NOT NULL DEFAULT '0',
        `DeliveryDate` DATE NOT NULL DEFAULT '2000-01-01',
        `AccountNumber` VARCHAR(50) NOT NULL DEFAULT '',
        `ManualOrderref` VARCHAR(50) NOT NULL DEFAULT '',
        `lastupdate` DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
        PRIMARY KEY (`IDCartSupplierData`),
        INDEX `cartdatasupplierid` (`IDCartData`, `supplierid`)
    )

我的示例查询

        EXPLAIN SELECT max(CartData.idCartDATA) AS idCartDATA , CartData.*, CartSplitData.*
        FROM CartData
        JOIN CartSplitData ON CartSplitData.IDCartDATA = CartDATA.IDCartData
        WHERE  CartData.CartOrderref = 'XXXXXXXXX'
        group by CartSplitData.SUPPLIERID

说明查询结果

推荐答案

专业提示避免使用 SELECT * SELECT表.* 敏感查询.而是按名称选择您实际需要使用的列

Pro tip Avoid SELECT * or SELECT table.* in performance-sensitive queries. Instead select, by name, the columns you actually need to use

专业提示:MySQL对您正在使用的 GROUP BY 有一个臭名昭著的非标准扩展,并且可能会被滥用.读这个. https://dev.mysql.com/doc/refman/8.0/en/group-by-handling.html 如果您遵循第一个提示,那么遵循第二个提示会容易得多.

Pro tip MySQL has a notorious nonstandard extension to GROUP BY which you are using, and possibly misusing. Read this. https://dev.mysql.com/doc/refman/8.0/en/group-by-handling.html If you followed the first pro tip, following the second would be much easier.

专业提示,避免抛出"大量单列索引,以期加快查询速度.相反,创建索引(通常是复合索引)以匹配实际查询的需求.阅读此 https://use-the-index-luke.com .

Pro tip Avoid "throwing in" lots of single column indexes in hopes of accelerating your queries. Instead, create indexes, often compound indexes, matching the needs of your actual query. Read this https://use-the-index-luke.com .

专业提示 使用临时功能;使用出现在EXPLAIN输出中的filesort 不一定是不好的.这仅表示查询引擎必须在返回部分结果集之前对其进行缓存. temporary 东西不是一个实际的表,它是一个RAM结构.如果它太大而导致RAM泛滥,MySQL会将其溢出到磁盘上.但是你不是.

Pro tip Using temporary; using filesort appearing in EXPLAIN output is not necessarily bad. It simply means that the query engine has to cache a partial result set before returning it. The temporary thing isn't an actual table, it's a RAM structure. If it is so big it swamps RAM, MySQL will spill it to disk. But yours isn't.

所有这些,让我们重构您的查询.我猜想您要为每个 CartSplitData.SUPPLIERID 检索具有最大 idCartDATA 值的行.

All that being said, let's refactor your query. I guess you want to retrieve the rows with the largest idCartDATA value for each CartSplitData.SUPPLIERID.

因此,我们将其写为子查询.

So let's write that as a subquery.

                  SELECT max(IDCartDATA) AS IDCartDATA, SUPPLIERID
                    FROM CartSplitData
                   GROUP BY SUPPLIERID

通过将复合索引放在CartSplitData上,可以大大加快该查询的速度:(SUPPLIERID,IDCartDATA).

This query can be sped up, dramatically, by putting a compound index on CartSplitData: (SUPPLIERID, IDCartDATA).

接下来,让我们重写您的主查询,以找到与该子查询中的ID相匹配的行.

Next, let's rewrite your main query to find the rows matching the ids in that subquery.

SELECT CartData.*             /* * hammers performance */
       CartSplitData.*        /* * hammers performance */
  FROM CartData
  JOIN CartSplitData ON CartSplitData.IDCartDATA = CartDATA.IDCartData
  JOIN (
                  SELECT max(IDCartDATA) AS IDCartDATA, SUPPLIERID
                    FROM CartSplitData
                   GROUP BY SUPPLIERID
       )x ON x.SUPPLIERID = CartSplitData.SUPPLIERID
         AND x.IDCartData = CartSplitData.IDCartData
 WHERE CartData.CartOrderref = 'XXXXXXXXX'

您在 CartData.CartOrderref 上的索引以及该复合索引创建的^^^都将帮助此外部查询.

Your index on CartData.CartOrderref will help this outer query, as will the compound index created ^^^.

这篇关于什么索引可以提高JOIN和GROUP BY的性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆