使用 DISTINCT 查询等价 [英] Query equivalence with DISTINCT

查看：63 发布时间：2021/6/15 19:50:27 sql performance query-optimization sql-server-2016

本文介绍了使用 DISTINCT 查询等价的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

让我们使用以下脚本创建一个简单的表 order(id: int, category: int, order_date: int)

Let us have a simple table order(id: int, category: int, order_date: int) created using the following script

IF OBJECT_ID('dbo.orders', 'U') IS NOT NULL  DROP TABLE dbo.orders

SELECT TOP 1000000
      NEWID() id, 
      ABS(CHECKSUM(NEWID())) % 100 category, 
      ABS(CHECKSUM(NEWID())) % 10000 order_date
INTO orders
FROM    sys.sysobjects
CROSS JOIN sys.all_columns

现在，我有两个等效的查询(至少我相信它们是等效的):

Now, I have two equivalent queries (at least I believe that they are equivalent):

-- Q1
select distinct o1.category,
       (select count(*) from orders o2 where order_date = 1 and o1.category = o2.category) 
from orders o1

-- Q2
select o1.category,
       (select count(*) from orders o2 where order_date = 1 and o1.category = o2.category) 
from (select distinct category from orders) o1

但是，当我运行这些查询时，它们具有明显不同的特征.对于我的数据，Q2 的速度要快两倍，这显然是由于查询计划在连接之前首先找到唯一类别(以下查询计划中的哈希匹配)所致.

However, when I run those queries they have a significantly different characteristic. The Q2 is twice faster for my data and it is clearly caused by the fact that the query plan first find unique categories (hash match in the following query plans) before the join.

如果添加请求的索引，差异仍然存在

The difference is still there if add requested index

CREATE NONCLUSTERED INDEX ix_order_date ON orders(order_date)
INCLUDE (category)

此外，Q2 还可以有效地使用以下指标，而 Q1 保持不变:

Moreover, the Q2 can use efficiently also the following index, whereas, the Q1 remains the same:

CREATE NONCLUSTERED INDEX ix_orders_kat ON orders(category, order_date)

我的问题是:

这些查询是否等效?
如果是，SQL Server 2016 查询优化器在 Q1 的情况下找到第二个查询计划的障碍是什么(我相信在这种情况下搜索空间一定非常小)?
如果不是，你能举个反例吗?

编辑

我提出这个问题的动机是，我想了解为什么查询优化器在重写简单查询方面如此糟糕，并且它们如此严重地依赖 SQL 语法.SQL 语言是一种声明性语言，因此，为什么即使对于这样的简单查询，SQL 查询处理器也经常由语法驱动?

My motivation for the question is that I would like to understand why query optimizers are so poor in rewriting even simple queries and they rely on SQL syntax so heavily. SQL language is a declarative language, therefore, why SQL query processors are driven by syntax so often even for simple queries like this?

使用 DISTINCT 查询等价 [英] Query equivalence with DISTINCT

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

使用 DISTINCT 查询等价 [英] Query equivalence with DISTINCT

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭