使用 DISTINCT 查询等价 [英] Query equivalence with DISTINCT

查看:63
本文介绍了使用 DISTINCT 查询等价的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

让我们使用以下脚本创建一个简单的表 order(id: int, category: int, order_date: int)

Let us have a simple table order(id: int, category: int, order_date: int) created using the following script

IF OBJECT_ID('dbo.orders', 'U') IS NOT NULL  DROP TABLE dbo.orders

SELECT TOP 1000000
      NEWID() id, 
      ABS(CHECKSUM(NEWID())) % 100 category, 
      ABS(CHECKSUM(NEWID())) % 10000 order_date
INTO orders
FROM    sys.sysobjects
CROSS JOIN sys.all_columns

现在,我有两个等效的查询(至少我相信它们是等效的):

Now, I have two equivalent queries (at least I believe that they are equivalent):

-- Q1
select distinct o1.category,
       (select count(*) from orders o2 where order_date = 1 and o1.category = o2.category) 
from orders o1

-- Q2
select o1.category,
       (select count(*) from orders o2 where order_date = 1 and o1.category = o2.category) 
from (select distinct category from orders) o1

但是,当我运行这些查询时,它们具有明显不同的特征.对于我的数据,Q2 的速度要快两倍,这显然是由于查询计划在连接之前首先找到唯一类别(以下查询计划中的哈希匹配)所致.

However, when I run those queries they have a significantly different characteristic. The Q2 is twice faster for my data and it is clearly caused by the fact that the query plan first find unique categories (hash match in the following query plans) before the join.

如果添加请求的索引,差异仍然存在

The difference is still there if add requested index

CREATE NONCLUSTERED INDEX ix_order_date ON orders(order_date)
INCLUDE (category)

此外,Q2 还可以有效地使用以下指标,而 Q1 保持不变:

Moreover, the Q2 can use efficiently also the following index, whereas, the Q1 remains the same:

CREATE NONCLUSTERED INDEX ix_orders_kat ON orders(category, order_date)

我的问题是:

  1. 这些查询是否等效?
  2. 如果是,SQL Server 2016 查询优化器在 Q1 的情况下找到第二个查询计划的障碍是什么(我相信在这种情况下搜索空间一定非常小)?
  3. 如果不是,你能举个反例吗?

编辑

我提出这个问题的动机是,我想了解为什么查询优化器在重写简单查询方面如此糟糕,并且它们如此严重地依赖 SQL 语法.SQL 语言是一种声明性语言,因此,为什么即使对于这样的简单查询,SQL 查询处理器也经常由语法驱动?

My motivation for the question is that I would like to understand why query optimizers are so poor in rewriting even simple queries and they rely on SQL syntax so heavily. SQL language is a declarative language, therefore, why SQL query processors are driven by syntax so often even for simple queries like this?

推荐答案

查询在功能上是等效的,这意味着它们应该返回相同的数据.

The queries are functionally equivalent, meaning that they should return the same data.

但是,SQL 引擎对它们的解释不同.第一个 (SELECT DISTINCT) 生成所有结果然后删除重复项.

However, they are interpreted differently by the SQL engine. The first (SELECT DISTINCT) generates all the results and then removes the duplicates.

第二个首先提取不同的值,所以子查询只在适当的子集上调用.

The second extracts the distinct values first, so the subquery is only called on the appropriate subset.

索引可能会使任一查询更高效,但它不会从根本上影响 distinct 处理发生在子查询之前还是之后.

An index might make either query more efficient, but it won't fundamentally affect whether the distinct processing occurs before or after the subquery.

在这种情况下,结果是相同的.但是,这取决于子查询.

In this case, the results are the same. However, that is not necessarily true depending on the subquery.

这篇关于使用 DISTINCT 查询等价的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆