如何使用排序、分区和分组进行行编号 [英] How to make row numbering with ordering, partitioning and grouping
问题描述
我需要通过排序、分区和分组进行行编号.按IdDocument, DateChange
排序,按IdDocument
分区,按IdRole
分组.问题尤其在于分组.从示例中可以看出 (NumberingExpected
) DENSE_RANK()
必须是用于此目的的最佳函数,但仅当用于顺序是一样的.在我的例子中,用于排序的值 (IdDocument, DateChange
) 总是不同的,重复编号必须由 IdRole
完成.
I need to make row numbering with ordering, partitioning and grouping. Ordering by IdDocument, DateChange
, partitioning by IdDocument
and grouping by IdRole
. The problem is in grouping especially. As it could be seen from the example (NumberingExpected
) DENSE_RANK()
must be the best function for this purpose but it makes repetition of numbering only when the values which are used to order are the same. In my case values used for ordering (IdDocument, DateChange
) are always different and repetition of numbering must be done by IdRole
.
当然可以很容易地通过使用游标来解决.但是有没有办法用编号/排名功能来实现它?
Sure it could be solved by the usage of cursor very easy. But is there any way to make it with numbering/ranking functions?
测试数据:
declare @LogTest as table (
Id INT
,IdRole INT
,DateChange DATETIME
,IdDocument INT
,NumberingExpected INT
)
insert into @LogTest
select 1 as Id, 7 as IdRole, GETDATE() as DateChange, 13 as IdDocument, 1 as NumberingExpected
union
select 2, 3, DATEADD(HH, 1, GETDATE()), 13, 2
union
select 3, 3, DATEADD(HH, 2, GETDATE()), 13, 2
union
select 4, 3, DATEADD(HH, 3, GETDATE()), 13, 2
union
select 5, 5, DATEADD(HH, 4, GETDATE()), 13, 3
union
select 7, 3, DATEADD(HH, 6, GETDATE()), 13, 4
union
select 6, 3, DATEADD(HH, 5, GETDATE()), 27, 1
union
select 8, 3, DATEADD(HH, 7, GETDATE()), 27, 1
union
select 9, 5, DATEADD(HH, 8, GETDATE()), 27, 2
union
select 10, 3, DATEADD(HH, 9, GETDATE()), 27, 3
select * from @LogTest order by IdDocument, DateChange;
函数式编程的解释:
- 按 IdDocument、DateChange 订购数据
- 将第一行编号设置为 i=1 转到下一行
- 如果 IdDocument 已更改{我=1;}别的 {如果 IdRow 已更改 { i++;}}
- 设置行号为 i;
- 转到下一行;
- IF EOF { 退出;} else { 转到第 3 步;}
推荐答案
从 2012 年开始可以使用 LAG
/LEAD
,但在 2008 年不可用,所以我们会效仿它.性能可能很差,您应该检查您的实际数据.
Since 2012 you could use LAG
/LEAD
, but in 2008 it is not available, so we'll emulate it. Performance could be poor, you should check with your actual data.
这是最后的查询:
WITH
CTE_rn
AS
(
SELECT
Main.IdRole
,Main.IdDocument
,Main.DateChange
,ROW_NUMBER() OVER(PARTITION BY Main.IdDocument ORDER BY Main.DateChange) AS rn
FROM
@LogTest AS Main
OUTER APPLY
(
SELECT TOP (1) T.IdRole
FROM @LogTest AS T
WHERE
T.IdDocument = Main.IdDocument
AND T.DateChange < Main.DateChange
ORDER BY T.DateChange DESC
) AS Prev
WHERE Main.IdRole <> Prev.IdRole OR Prev.IdRole IS NULL
)
SELECT *
FROM
@LogTest AS LT
CROSS APPLY
(
SELECT TOP(1) CTE_rn.rn
FROM CTE_rn
WHERE
CTE_rn.IdDocument = LT.IdDocument
AND CTE_rn.IdRole = LT.IdRole
AND CTE_rn.DateChange <= LT.DateChange
ORDER BY CTE_rn.DateChange DESC
) CA_rn
ORDER BY IdDocument, DateChange;
最终结果集:
Id IdRole DateChange IdDocument NumberingExpected rn
1 7 2015-01-26 20:00:41.210 13 1 1
2 3 2015-01-26 21:00:41.210 13 2 2
3 3 2015-01-26 22:00:41.210 13 2 2
4 3 2015-01-26 23:00:41.210 13 2 2
5 5 2015-01-27 00:00:41.210 13 3 3
7 3 2015-01-27 02:00:41.210 13 4 4
6 3 2015-01-27 01:00:41.210 27 1 1
8 3 2015-01-27 03:00:41.210 27 1 1
9 5 2015-01-27 04:00:41.210 27 2 2
10 3 2015-01-27 05:00:41.210 27 3 3
工作原理
1) 当表按 IdDocument 和 DateChange 排序时,我们需要上一行的 IdRole 值.为了得到它,我们使用 OUTER APPLY
(因为 LAG
不可用):
SELECT *
FROM
@LogTest AS Main
OUTER APPLY
(
SELECT TOP (1) T.IdRole
FROM @LogTest AS T
WHERE
T.IdDocument = Main.IdDocument
AND T.DateChange < Main.DateChange
ORDER BY T.DateChange DESC
) AS Prev
ORDER BY Main.IdDocument, Main.DateChange;
这是第一步的结果集:
Id IdRole DateChange IdDocument NumberingExpected IdRole
1 7 2015-01-26 20:50:32.560 13 1 NULL
2 3 2015-01-26 21:50:32.560 13 2 7
3 3 2015-01-26 22:50:32.560 13 2 3
4 3 2015-01-26 23:50:32.560 13 2 3
5 5 2015-01-27 00:50:32.560 13 3 3
7 3 2015-01-27 02:50:32.560 13 4 5
6 3 2015-01-27 01:50:32.560 27 1 NULL
8 3 2015-01-27 03:50:32.560 27 1 3
9 5 2015-01-27 04:50:32.560 27 2 3
10 3 2015-01-27 05:50:32.560 27 3 5
2) 我们想要删除具有重复 IdRole 的行,因此我们添加一个 WHERE
并对行进行编号.可以看到行号跟预期的结果一致:
2) We want to remove rows with repeating IdRole, so we add a WHERE
and number the rows. You can see that row numbers follow the expected result:
SELECT
Main.IdRole
,Main.IdDocument
,Main.DateChange
,ROW_NUMBER() OVER(PARTITION BY Main.IdDocument ORDER BY Main.DateChange) AS rn
FROM
@LogTest AS Main
OUTER APPLY
(
SELECT TOP (1) T.IdRole
FROM @LogTest AS T
WHERE
T.IdDocument = Main.IdDocument
AND T.DateChange < Main.DateChange
ORDER BY T.DateChange DESC
) AS Prev
WHERE Main.IdRole <> Prev.IdRole OR Prev.IdRole IS NULL
;
这是这一步的结果集(成为CTE):
This is result set of this step (it becomes the CTE):
IdRole IdDocument DateChange rn
7 13 2015-01-26 20:13:26.247 1
3 13 2015-01-26 21:13:26.247 2
5 13 2015-01-27 00:13:26.247 3
3 13 2015-01-27 02:13:26.247 4
3 27 2015-01-27 01:13:26.247 1
5 27 2015-01-27 04:13:26.247 2
3 27 2015-01-27 05:13:26.247 3
3) 最后,我们需要从 CTE 中为原始表的每一行获取正确的行号.我使用 CROSS APPLY
从 CTE 中为原始表的每一行获取一行.
3) Finally, we need to get the correct row number from CTE for each row of the original table. I use CROSS APPLY
to get one row from CTE for each row of the original table.
这篇关于如何使用排序、分区和分组进行行编号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!