SQL Server通过聚合选择随机(或第一个)值 [英] SQL Server pick random (or first) value with aggregation

查看:502
本文介绍了SQL Server通过聚合选择随机(或第一个)值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何获取SQL Server返回的第一个值(我不在乎,它只需要快速)就可以在汇总时遇到?

How can I get SQL Server to return the first value (any one, I don't care, it just needs to be fast) it comes across when aggregating?

例如,假设我有:

ID      Group
1       A
2       A
3       A
4       B
5       B

我需要每个组的ID的任意一个。我可以这样做,如下所示:

and I need to get any one of the ID's for each group. I can do this as follows:

Select 
max(id)
,group 
from Table 
group by group

返回

ID      Group
3       A
5       B

就可以了,但是当我真正需要做的就是选择遇到的第一个ID时,要求SQL Server计算最高的ID对我来说似乎很愚蠢。

That does the job, but it seems stupid to me to ask SQL Server to calculate the highest ID when all it really needs to do is to pick the first ID it comes across.

谢谢

PS-字段已编制索引,因此也许并没有真正的作用?

PS - the fields are indexed, so maybe it doesn't really make a difference?

推荐答案

有一个 ANY 的未记录聚合,该语法无效,但可能会出现在您的执行计划中。

There is an undocumented aggregate called ANY which is not valid syntax but is possible to get to appear in your execution plans. This does not provide any performance advantage however.

假定下表和索引结构

CREATE TABLE T
(
id int identity primary key,
[group] char(1) 
)

CREATE NONCLUSTERED INDEX ix ON T([group])

INSERT INTO T
SELECT TOP 1000000 CHAR( 65 + ROW_NUMBER() OVER (ORDER BY @@SPID) % 3)
FROM sys.all_objects o1, sys.all_objects o2, sys.all_objects o3

我也填充了示例数据,例如

I have also populated with sample data such that there are many rows per group.

您的原始查询

SELECT MAX(id),
       [group]
FROM   T
GROUP  BY [group]  

提供表'T'。扫描计数1,逻辑读取1367 和计划

  |--Stream Aggregate(GROUP BY:([[T].[group]) DEFINE:([Expr1003]=MAX([[T].[id])))
       |--Index Scan(OBJECT:([[T].[ix]), ORDERED FORWARD)

重写以得到 ANY 总计...

Rewritten to get the ANY aggregate...

;WITH cte AS
(
SELECT *,
        ROW_NUMBER() OVER (PARTITION BY [group] ORDER BY [group] ) AS RN
FROM T)
SELECT id,
       [group]
FROM    cte     
WHERE RN=1

给出表'T'。扫描计数1,逻辑读取1367 和计划

  |--Stream Aggregate(GROUP BY:([[T].[group]) DEFINE:([[T].[id]=ANY([[T].[id])))
       |--Index Scan(OBJECT:([[T].[ix]), ORDERED FORWARD)

即使SQL Server可能停止处理该组一旦找到第一个值,就跳到下一个没有的值。

Even though potentially SQL Server could stop processing the group as soon as the first value is found and skip to the next one it doesn't. It still processes all rows and the logical reads are the same.

对于该示例,如果该组中有许多行,则更有效的版本将是递归CTE。

For this particular example with many rows in the group a more efficient version would be a recursive CTE.

WITH    RecursiveCTE
AS      (
        SELECT TOP 1 id, [group]
        FROM T
        ORDER BY [group]
        UNION   ALL
        SELECT  R.id, R.[group]
        FROM    (
                SELECT  T.*,
                        rn = ROW_NUMBER() OVER (ORDER BY (SELECT 0))
                FROM    T
                JOIN    RecursiveCTE R
                        ON  R.[group] < T.[group]
                ) R
        WHERE   R.rn = 1
        )
SELECT  *
FROM    RecursiveCTE
OPTION  (MAXRECURSION 0);

哪个给予

Table 'Worktable'. Scan count 2, logical reads 19
Table 'T'. Scan count 4, logical reads 12

逻辑读取要少得多,因为它检索每个组的第一行然后查找下一组,而不是读取大量对最终结果没有帮助的记录。

The logical reads are much less as it retrieves the first row per group then seeks into the next group rather than reading a load of records that don't contribute to the final result.

这篇关于SQL Server通过聚合选择随机(或第一个)值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆