SQL Server通过聚合选择随机(或第一个)值 [英] SQL Server pick random (or first) value with aggregation
问题描述
如何获取SQL Server返回的第一个值(我不在乎,它只需要快速)就可以在汇总时遇到?
How can I get SQL Server to return the first value (any one, I don't care, it just needs to be fast) it comes across when aggregating?
例如,假设我有:
ID Group
1 A
2 A
3 A
4 B
5 B
我需要每个组的ID的任意一个。我可以这样做,如下所示:
and I need to get any one of the ID's for each group. I can do this as follows:
Select
max(id)
,group
from Table
group by group
返回
ID Group
3 A
5 B
就可以了,但是当我真正需要做的就是选择遇到的第一个ID时,要求SQL Server计算最高的ID对我来说似乎很愚蠢。
That does the job, but it seems stupid to me to ask SQL Server to calculate the highest ID when all it really needs to do is to pick the first ID it comes across.
谢谢
PS-字段已编制索引,因此也许并没有真正的作用?
PS - the fields are indexed, so maybe it doesn't really make a difference?
推荐答案
有一个 ANY 的未记录聚合,该语法无效,但可能会出现在您的执行计划中。
There is an undocumented aggregate called ANY
which is not valid syntax but is possible to get to appear in your execution plans. This does not provide any performance advantage however.
假定下表和索引结构
CREATE TABLE T
(
id int identity primary key,
[group] char(1)
)
CREATE NONCLUSTERED INDEX ix ON T([group])
INSERT INTO T
SELECT TOP 1000000 CHAR( 65 + ROW_NUMBER() OVER (ORDER BY @@SPID) % 3)
FROM sys.all_objects o1, sys.all_objects o2, sys.all_objects o3
我也填充了示例数据,例如
I have also populated with sample data such that there are many rows per group.
您的原始查询
SELECT MAX(id),
[group]
FROM T
GROUP BY [group]
提供表'T'。扫描计数1,逻辑读取1367
和计划
|--Stream Aggregate(GROUP BY:([[T].[group]) DEFINE:([Expr1003]=MAX([[T].[id])))
|--Index Scan(OBJECT:([[T].[ix]), ORDERED FORWARD)
重写以得到 ANY
总计...
Rewritten to get the ANY
aggregate...
;WITH cte AS
(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY [group] ORDER BY [group] ) AS RN
FROM T)
SELECT id,
[group]
FROM cte
WHERE RN=1
给出表'T'。扫描计数1,逻辑读取1367
和计划
|--Stream Aggregate(GROUP BY:([[T].[group]) DEFINE:([[T].[id]=ANY([[T].[id])))
|--Index Scan(OBJECT:([[T].[ix]), ORDERED FORWARD)
即使SQL Server可能停止处理该组一旦找到第一个值,就跳到下一个没有的值。
Even though potentially SQL Server could stop processing the group as soon as the first value is found and skip to the next one it doesn't. It still processes all rows and the logical reads are the same.
对于该示例,如果该组中有许多行,则更有效的版本将是递归CTE。
For this particular example with many rows in the group a more efficient version would be a recursive CTE.
WITH RecursiveCTE
AS (
SELECT TOP 1 id, [group]
FROM T
ORDER BY [group]
UNION ALL
SELECT R.id, R.[group]
FROM (
SELECT T.*,
rn = ROW_NUMBER() OVER (ORDER BY (SELECT 0))
FROM T
JOIN RecursiveCTE R
ON R.[group] < T.[group]
) R
WHERE R.rn = 1
)
SELECT *
FROM RecursiveCTE
OPTION (MAXRECURSION 0);
哪个给予
Table 'Worktable'. Scan count 2, logical reads 19
Table 'T'. Scan count 4, logical reads 12
逻辑读取要少得多,因为它检索每个组的第一行然后查找下一组,而不是读取大量对最终结果没有帮助的记录。
The logical reads are much less as it retrieves the first row per group then seeks into the next group rather than reading a load of records that don't contribute to the final result.
这篇关于SQL Server通过聚合选择随机(或第一个)值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!