选择条件较慢的三个查询,但条件三个较快的三个查询的任意组合的同一查询 [英] Select query with three where conditions is slow, but the same query with any combination of two of the three where conditions is fast

查看:51
本文介绍了选择条件较慢的三个查询,但条件三个较快的三个查询的任意组合的同一查询的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下查询:

SELECT table_1.id

FROM
table_1
LEFT JOIN table_2 ON (table_1.id = table_2.id)

WHERE
table_1.col_condition_1 = 0
AND table_1.col_condition_2 NOT IN (3, 4)
AND (table_2.id is NULL OR table_1.date_col > table_2.date_col)

LIMIT 5000;

我有以下键和索引:

  • table_1.id主键.
  • 在table_1.col_condition_1上的索引
  • 在table_1.col_condition_2上的索引
  • 表_1.col_condition_1和表_1.col_condition_2上的复合索引

获取正确的索引.查询说明:

The correct indexes are getting picked up. Query explain:

+--+----+-------------+---------+--------+---------------------------------------------------------------------+-----------------------+---------+------------+----------+-----------------------+--+
|  | id | select_type |  table  |  type  |                            possible_keys                            |          key          | key_len |    ref     |   rows   |         Extra         |  |
+--+----+-------------+---------+--------+---------------------------------------------------------------------+-----------------------+---------+------------+----------+-----------------------+--+
|  |  1 | SIMPLE      | table_1 | range  | "the composite index", col_condition_1 index ,col_condition_2 index | "the composite index" |       7 |            | 11819433 | Using index condition |  |
|  |  1 | SIMPLE      | table_2 | eq_ref | PRIMARY,id_UNIQUE                                                   | PRIMARY               |       8 | table_1.id |        1 | Using where           |  |
+--+----+-------------+---------+--------+---------------------------------------------------------------------+-----------------------+---------+------------+----------+-----------------------+--+

表_1拥有约60 MM记录,表_2具有约4 MM记录.

table_1 has ~60 MM records, and table_2 has ~4 MM records.

查询需要60秒才能返回结果.

有趣的是:

SELECT table_1.id

FROM
table_1
LEFT JOIN table_2 ON (table_1.id = table_2.id)

WHERE
table_1.col_condition_1 = 0
AND table_1.col_condition_2 NOT IN (3, 4)

LIMIT 5000;

花费145毫秒返回结果,并选择与第一个查询相同的索引.

takes 145 ms to return a result and has the same indexes picked as the first query.

SELECT table_1.id

FROM
table_1
LEFT JOIN table_2 ON (table_1.id = table_2.id)

WHERE
table_1.col_condition_1 = 0
AND (table_2.id is NULL OR table_1.date_col > table_2.date_col)

LIMIT 5000;

需要174毫秒才能返回结果.

takes 174 ms to return a result.

查询说明:

+----+-------------+---------+--------+---------------------------------------------------------------------+-----------------+---------+------------+----------+-------------+
| id | select_type |  table  |  type  |                            possible_keys                            |       key       | key_len |    ref     |   rows   |    Extra    |
+----+-------------+---------+--------+---------------------------------------------------------------------+-----------------+---------+------------+----------+-------------+
|  1 | SIMPLE      | table_1 | ref    | "the composite index", col_condition_1 index ,col_condition_2 index | col_condition_1 |       2 | const      | 30381842 | NULL        |
|  1 | SIMPLE      | table_2 | eq_ref | PRIMARY,id_UNIQUE                                                   | PRIMARY         |       8 | table_1.id |        1 | Using where |
+----+-------------+---------+--------+---------------------------------------------------------------------+-----------------+---------+------------+----------+-------------+

还有

SELECT table_1.id

FROM
table_1
LEFT JOIN table_2 ON (table_1.id = table_2.id)

WHERE
table_1.col_condition_2 NOT IN (3, 4)
AND (table_2.id is NULL OR table_1.date_col > table_2.date_col)

LIMIT 5000;

大约需要1秒才能返回结果.

takes about 1 second to return a result.

查询说明:

+----+-------------+---------+--------+---------------------------------------------------------------------+-----------------+---------+------------+----------+-----------------------+
| id | select_type |  table  |  type  |                            possible_keys                            |       key       | key_len |    ref     |   rows   |         Extra         |
+----+-------------+---------+--------+---------------------------------------------------------------------+-----------------+---------+------------+----------+-----------------------+
|  1 | SIMPLE      | table_1 | range  | "the composite index", col_condition_1 index ,col_condition_2 index | col_condition_2 |       5 |            | 36254294 | Using index condition |
|  1 | SIMPLE      | table_2 | eq_ref | PRIMARY,id_UNIQUE                                                   | PRIMARY         |       8 | table_1.id |        1 | Using where           |
+----+-------------+---------+--------+---------------------------------------------------------------------+-----------------+---------+------------+----------+-----------------------+

另外,当我分别使用每个where条件时,查询会在约100毫秒内返回结果.

Also when I use every where condition separately, the query returns a result in ~100 ms.

我的问题是,即使同时使用正确的索引并使用任意两个条件执行查询,为什么在同时使用三个where条件时查询要花大量时间(60秒)才能返回结果三个条件也可以在更短的时间内返回结果.

My question is why the query takes a significant amount of time (60 seconds) to return a result when using the three where conditions together even though it looks like the correct indexes are getting used and executing the query with any two of the three where conditions also returns a result in much less time.

还有,有没有一种方法可以优化此查询?

Also, is there a way to optimize this query?

谢谢.

创建表:

表_1:

CREATE TABLE `table_1` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `col_condition_1` tinyint(1) DEFAULT '0',
  `col_condition_2` int(11) DEFAULT NULL,
  `date_col` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`),
  KEY `compositeidx` (`col_condition_1`,`col_condition_2`),
  KEY `col_condition_1_idx` (`col_condition_1`),
  KEY `col_condition_2_idx` (`col_condition_2`)
) ENGINE=InnoDB AUTO_INCREMENT=68272192 DEFAULT CHARSET=utf8

表_2:

CREATE TABLE `table_2` (
  `id` bigint(20) NOT NULL,
  `date_col` timestamp NULL DEFAULT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `id_UNIQUE` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

推荐答案

类似的问题往往需要尝试并进行测试,以查看它们的工作情况.

Problems like this tend to require trying things and testing to see how well they work.

因此,从此开始:

SELECT
table_1.id
FROM
table_1
LEFT JOIN table_2
ON table_1.id = table_2.id
AND table_1.date_col <= table_2.date_col
WHERE
table_1.col_condition_1 = 0
AND table_1.col_condition_2 NOT IN (3, 4)
AND table_2.id is NULL

LIMIT 5000;

这与您的查询等效的逻辑推理: 原始查询的WHERE语句(table_2.id is NULL OR table_1.date_col > table_2.date_col)可以概括为仅包括不具有table_2记录或者table_2记录早于(或等于)table_1记录的table_1记录.

Logical reasoning on why this is equivalent to your query: Your original query's WHERE statement of (table_2.id is NULL OR table_1.date_col > table_2.date_col) can be summarized as "Only include table_1 records that either do NOT have a table_2 record, or where the table_2 record is earlier than (or equal to) the table_1 record.

我的查询版本使用反联接来排除所有存在于table_1记录之前(或等于table_1记录)的table_2的所有table_1记录.

My version of the query uses an anti-join to exclude all table_1 records where they exists a table_2 that is earlier than (or equal to) the table_1 record.

有许多可能的组合索引可以帮助此查询.这里有几个开始:

There are a number of possible composite indexes that may help this query. Here are a couple to start with:

对于表_2:(id,date_col)

对于表_1:(col_condition_1,id,date_col,col_condition_2)

请尝试我的查询和索引,并报告结果(包括EXPLAIN计划).

Please try my query and indexes, and report the results (including EXPLAIN plan).

这篇关于选择条件较慢的三个查询,但条件三个较快的三个查询的任意组合的同一查询的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆