如何使用排序、分区和分组进行行编号 [英] How to make row numbering with ordering, partitioning and grouping

查看:32
本文介绍了如何使用排序、分区和分组进行行编号的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要通过排序、分区和分组进行行编号.按IdDocument, DateChange 排序,按IdDocument 分区,按IdRole 分组.问题尤其在于分组.从示例中可以看出 (NumberingExpected) DENSE_RANK() 必须是用于此目的的最佳函数,但仅当用于顺序是一样的.在我的例子中,用于排序的值 (IdDocument, DateChange) 总是不同的,重复编号必须由 IdRole 完成.

I need to make row numbering with ordering, partitioning and grouping. Ordering by IdDocument, DateChange, partitioning by IdDocument and grouping by IdRole. The problem is in grouping especially. As it could be seen from the example (NumberingExpected) DENSE_RANK() must be the best function for this purpose but it makes repetition of numbering only when the values which are used to order are the same. In my case values used for ordering (IdDocument, DateChange) are always different and repetition of numbering must be done by IdRole.

当然可以很容易地通过使用游标来解决.但是有没有办法用编号/排名功能来实现它?

Sure it could be solved by the usage of cursor very easy. But is there any way to make it with numbering/ranking functions?

测试数据:

declare @LogTest as table (
    Id INT
    ,IdRole INT
    ,DateChange DATETIME
    ,IdDocument INT
    ,NumberingExpected INT
)
insert into @LogTest
select 1 as Id, 7 as IdRole, GETDATE() as DateChange, 13 as IdDocument, 1 as NumberingExpected
union 
select 2, 3, DATEADD(HH, 1, GETDATE()), 13, 2
union 
select 3, 3, DATEADD(HH, 2, GETDATE()), 13, 2
union 
select 4, 3, DATEADD(HH, 3, GETDATE()), 13, 2
union 
select 5, 5, DATEADD(HH, 4, GETDATE()), 13, 3
union 
select 7, 3, DATEADD(HH, 6, GETDATE()), 13, 4
union 
select 6, 3, DATEADD(HH, 5, GETDATE()), 27, 1
union 
select 8, 3, DATEADD(HH, 7, GETDATE()), 27, 1
union 
select 9, 5, DATEADD(HH, 8, GETDATE()), 27, 2
union 
select 10, 3, DATEADD(HH, 9, GETDATE()), 27, 3


select * from @LogTest order by IdDocument, DateChange;

函数式编程的解释:

  1. 按 IdDocument、DateChange 订购数据
  2. 将第一行编号设置为 i=1 转到下一行
  3. 如果 IdDocument 已更改{我=1;}别的 {如果 IdRow 已更改 { i++;}}
  4. 设置行号为 i;
  5. 转到下一行;
  6. IF EOF { 退出;} else { 转到第 3 步;}

推荐答案

从 2012 年开始可以使用 LAG/LEAD,但在 2008 年不可用,所以我们会效仿它.性能可能很差,您应该检查您的实际数据.

Since 2012 you could use LAG/LEAD, but in 2008 it is not available, so we'll emulate it. Performance could be poor, you should check with your actual data.

这是最后的查询:

WITH
CTE_rn
AS
(
    SELECT
        Main.IdRole
        ,Main.IdDocument
        ,Main.DateChange
        ,ROW_NUMBER() OVER(PARTITION BY Main.IdDocument ORDER BY Main.DateChange) AS rn
    FROM
        @LogTest AS Main
        OUTER APPLY
        (
            SELECT TOP (1) T.IdRole
            FROM @LogTest AS T
            WHERE
                T.IdDocument = Main.IdDocument
                AND T.DateChange < Main.DateChange
            ORDER BY T.DateChange DESC
        ) AS Prev
    WHERE Main.IdRole <> Prev.IdRole OR Prev.IdRole IS NULL
)
SELECT *
FROM
    @LogTest AS LT
    CROSS APPLY
    (
        SELECT TOP(1) CTE_rn.rn
        FROM CTE_rn
        WHERE
            CTE_rn.IdDocument = LT.IdDocument
            AND CTE_rn.IdRole = LT.IdRole
            AND CTE_rn.DateChange <= LT.DateChange
        ORDER BY CTE_rn.DateChange DESC
    ) CA_rn
ORDER BY IdDocument, DateChange;

最终结果集:

Id    IdRole    DateChange                 IdDocument    NumberingExpected    rn
1     7         2015-01-26 20:00:41.210    13            1                    1
2     3         2015-01-26 21:00:41.210    13            2                    2
3     3         2015-01-26 22:00:41.210    13            2                    2
4     3         2015-01-26 23:00:41.210    13            2                    2
5     5         2015-01-27 00:00:41.210    13            3                    3
7     3         2015-01-27 02:00:41.210    13            4                    4
6     3         2015-01-27 01:00:41.210    27            1                    1
8     3         2015-01-27 03:00:41.210    27            1                    1
9     5         2015-01-27 04:00:41.210    27            2                    2
10    3         2015-01-27 05:00:41.210    27            3                    3

工作原理

1) 当表按 IdDocument 和 DateChange 排序时,我们需要上一行的 IdRole 值.为了得到它,我们使用 OUTER APPLY(因为 LAG 不可用):

SELECT *
FROM
    @LogTest AS Main
    OUTER APPLY
    (
        SELECT TOP (1) T.IdRole
        FROM @LogTest AS T
        WHERE
            T.IdDocument = Main.IdDocument
            AND T.DateChange < Main.DateChange
        ORDER BY T.DateChange DESC
    ) AS Prev
ORDER BY Main.IdDocument, Main.DateChange;

这是第一步的结果集:

Id    IdRole    DateChange                 IdDocument    NumberingExpected    IdRole
1     7         2015-01-26 20:50:32.560    13            1                    NULL
2     3         2015-01-26 21:50:32.560    13            2                    7
3     3         2015-01-26 22:50:32.560    13            2                    3
4     3         2015-01-26 23:50:32.560    13            2                    3
5     5         2015-01-27 00:50:32.560    13            3                    3
7     3         2015-01-27 02:50:32.560    13            4                    5
6     3         2015-01-27 01:50:32.560    27            1                    NULL
8     3         2015-01-27 03:50:32.560    27            1                    3
9     5         2015-01-27 04:50:32.560    27            2                    3
10    3         2015-01-27 05:50:32.560    27            3                    5

2) 我们想要删除具有重复 IdRole 的行,因此我们添加一个 WHERE 并对行进行编号.可以看到行号跟预期的结果一致:

2) We want to remove rows with repeating IdRole, so we add a WHERE and number the rows. You can see that row numbers follow the expected result:

SELECT
    Main.IdRole
    ,Main.IdDocument
    ,Main.DateChange
    ,ROW_NUMBER() OVER(PARTITION BY Main.IdDocument ORDER BY Main.DateChange) AS rn
FROM
    @LogTest AS Main
    OUTER APPLY
    (
        SELECT TOP (1) T.IdRole
        FROM @LogTest AS T
        WHERE
            T.IdDocument = Main.IdDocument
            AND T.DateChange < Main.DateChange
        ORDER BY T.DateChange DESC
    ) AS Prev
WHERE Main.IdRole <> Prev.IdRole OR Prev.IdRole IS NULL
;

这是这一步的结果集(成为CTE):

This is result set of this step (it becomes the CTE):

IdRole    IdDocument    DateChange                 rn
7         13            2015-01-26 20:13:26.247    1
3         13            2015-01-26 21:13:26.247    2
5         13            2015-01-27 00:13:26.247    3
3         13            2015-01-27 02:13:26.247    4
3         27            2015-01-27 01:13:26.247    1
5         27            2015-01-27 04:13:26.247    2
3         27            2015-01-27 05:13:26.247    3

3) 最后,我们需要从 CTE 中为原始表的每一行获取正确的行号.我使用 CROSS APPLY 从 CTE 中为原始表的每一行获取一行.

3) Finally, we need to get the correct row number from CTE for each row of the original table. I use CROSS APPLY to get one row from CTE for each row of the original table.

这篇关于如何使用排序、分区和分组进行行编号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆