我可以使用SQL Server CTE合并相交的日期吗? [英] Can I use a SQL Server CTE to merge intersecting dates?

查看:49
本文介绍了我可以使用SQL Server CTE合并相交的日期吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在编写一个应用程序,该应用程序可以处理一些员工的排班时间。作为此过程的一部分,我需要计算一天中他们请求关闭的时间。

I'm writing an app that handles scheduling time off for some of our employees. As part of this, I need to calculate how many minutes throughout the day that they have requested off.

在此工具的第一个版本中,我们不允许重叠的请求时间,因为我们希望能够将所有请求的总计 StartTime 减去 EndTime 。防止重叠可以使计算变得非常快。

In the first version of this tool, we disallowed overlapping time off requests, because we wanted to be able to just add up the total of StartTime minus EndTime for all requests. Preventing overlaps makes this calculation very fast.

这已经成为问题,因为经理现在想安排团队会议,但是当有人已经要求一天会议时却无法安排会议关闭。

This has become problematic, because Managers now want to schedule team meetings but are unable to do so when someone has already asked for the day off.

因此,在该工具的新版本中,我们要求允许重叠的请求。

So, in the new version of the tool, we have a requirement to allow overlapping requests.

下面是一个示例数据集,例如:

Here is an example set of data like what we have:

UserId | StartDate | EndDate
----------------------------
 1     | 2:00      | 4:00
 1     | 3:00      | 5:00
 1     | 3:45      | 9:00
 2     | 6:00      | 9:00
 2     | 7:00      | 8:00
 3     | 2:00      | 3:00
 3     | 4:00      | 5:00
 4     | 1:00      | 7:00

我需要尽可能高效地获得以下结果:

The result that I need to get, as efficiently as possible, is this:

UserId | StartDate | EndDate
----------------------------
 1     | 2:00      | 9:00
 2     | 6:00      | 9:00
 3     | 2:00      | 3:00
 3     | 4:00      | 5:00
 4     | 1:00      | 7:00

我们可以使用以下查询轻松检测到重叠:

We can easily detect overlaps with this query:

select
    *
from
    requests r1
cross join
    requests r2
where
    r1.RequestId < r2.RequestId
  and
    r1.StartTime < r2.EndTime
  and
    r2.StartTime < r1.EndTime

实际上,这就是我们最初检测和预防问题的方式。

This is, in fact, how we were detecting and preventing the problems originally.

现在,我们正在尝试合并重叠的项目,但是我已经达到了我的SQL忍者技能的极限。

Now, we are trying to merge the overlapping items, but I'm reaching the limits of my SQL ninja skills.

想出一种使用临时表的方法并不难,但我们要尽可能避免这种情况。

It wouldn't be too hard to come up with a method using temp tables, but we want to avoid this if at all possible.

是否存在

所有行都显示出来也是可以接受的,只要它们被折叠成它们的时间即可。例如,如果某人想要从三到五,从四到六,那么他们可以接受两排,一排从三排到五排,下一排从五排到六排,或者一排从三排到四排,然后接下来是4到6。

It would also be acceptable for the all of the rows to show up, as long as they were collapsed into just their time. For example if someone wants off from three to five, and from four to six, it would be acceptable for them to have two rows, one from three to five, and the next from five to six OR one from three to four, and the next from four to six.

此外,这是一个小测试台:

Also, here is a little test bench:

DECLARE @requests TABLE
(
    UserId int,
    StartDate time,
    EndDate time
)

INSERT INTO @requests (UserId, StartDate, EndDate) VALUES
(1, '2:00', '4:00'),
(1, '3:00', '5:00'),
(1, '3:45', '9:00'),
(2, '6:00', '9:00'),
(2, '7:00', '8:00'),
(3, '2:00', '3:00'),
(3, '4:00', '5:00'),
(4, '1:00', '7:00');


推荐答案

好的,可以使用CTE。我不知道如何在夜晚开始时使用它们,但这是我的研究结果:

Ok, it is possible to do with CTEs. I did not know how to use them at the beginning of the night, but here is the results of my research:

递归CTE有两部分,锚点语句和递归语句。

A recursive CTE has 2 parts, the "anchor" statement and the "recursive" statements.

关于递归语句的关键部分是,当对它进行评估时,只有尚未评估的行才会显示在

The crucial part about the recursive statement is that when it is evaluated, only the rows that have not already been evaluated will show up in the recursion.

因此,例如,如果我们想使用CTE来获取这些用户的全包时间列表,则可以使用以下内容:

So, for example, if we wanted to use CTEs to get an all-inclusive list of times for these users, we could use something like this:

WITH
  sorted_requests as (
    SELECT
        UserId, StartDate, EndDate,
        ROW_NUMBER() OVER (PARTITION BY UserId ORDER BY StartDate, EndDate DESC) Instance
    FROM @requests
  ),
  no_overlap(UserId, StartDate, EndDate, Instance) as (
    SELECT *
    FROM sorted_requests
    WHERE Instance = 1

    UNION ALL

    SELECT s.*
    FROM sorted_requests s
    INNER JOIN no_overlap n
    ON s.UserId = n.UserId
    AND s.Instance = n.Instance + 1
  )
SELECT *
FROM no_overlap

这里, anchor语句只是每个用户的第一个实例, WHERE实例= 1

Here, the "anchor" statement is just the first instance for every user, WHERE Instance = 1.

递归语句使用将每一行连接到集合中的下一行.UserId = n.UserId AND s.Instance = n.Instance +1

现在,我们可以使用数据的属性,当按开始日期排序,则任何重叠的行的开始日期都将小于上一行的结束日期。如果我们不断传播第一个相交行的行号,则每个随后的重叠行将共享该行号。

Now, we can use the property of the data, when sorted by start date, that any overlapping row will have a start date that is less than the previous row's end date. If we continually propagate the row number of the first intersecting row, every subsequent overlapping row will share that row number.

使用此查询:

WITH
  sorted_requests as (
    SELECT
        UserId, StartDate, EndDate,
        ROW_NUMBER() OVER (PARTITION BY UserId ORDER BY StartDate, EndDate DESC) Instance
    FROM
        @requests
  ),
  no_overlap(UserId, StartDate, EndDate, Instance, ConnectedGroup) as (
    SELECT
        UserId,
        StartDate,
        EndDate,
        Instance,
        Instance as ConnectedGroup
    FROM sorted_requests
    WHERE Instance = 1

    UNION ALL

    SELECT
        s.UserId,
        s.StartDate,
        CASE WHEN n.EndDate >= s.EndDate
            THEN n.EndDate
            ELSE s.EndDate
        END EndDate,
        s.Instance,
        CASE WHEN n.EndDate >= s.StartDate
            THEN n.ConnectedGroup
            ELSE s.Instance
        END ConnectedGroup
    FROM sorted_requests s
    INNER JOIN no_overlap n
    ON s.UserId = n.UserId AND s.Instance = n.Instance + 1
  )
SELECT
    UserId,
    MIN(StartDate) StartDate,
    MAX(EndDate) EndDate
FROM no_overlap
GROUP BY UserId, ConnectedGroup
ORDER BY UserId

我们将上述第一个相交行(在此查询中称为 ConnectedGroup )分组,并找到该组中的最小开始时间和最大结束时间。

We group by the aforementioned "first intersecting row" (called ConnectedGroup in this query) and find the minimum start time and maximum end time in that group.

第一个相交的行使用以下语句传播:

The first intersecting row is propagated using this statement:

CASE WHEN n.EndDate >= s.StartDate
    THEN n.ConnectedGroup
    ELSE s.Instance
END ConnectedGroup

基本上是这样说的:如果此行与上一行相交(基于对我们进行排序(按开始日期),然后考虑此行与上一行具有相同的行分组。否则,请将该行自己的行号用作自身的行分组。

Which basically says, "if this row intersects with the previous row (based on us being sorted by start date), then consider this row to have the same 'row grouping' as the previous row. Otherwise, use this row's own row number as the 'row grouping' for itself."

这将为我们提供所需的确切信息。

This gives us exactly what we were looking for.

编辑

当我最初以为这是白板上的内容时,我知道我必须前进每行的 EndDate ,以确保它与下一行相交(如果所连接的组中的前几行有相交的话),我无意中忽略了这一点。这已得到纠正。

When I had originally thought this up on my whiteboard, I knew that I would have to advance the EndDate of each row, to ensure that it would intersect with the next row, if any of the previous rows in the connected group would have intersected. I accidentally left that out. This has been corrected.

这篇关于我可以使用SQL Server CTE合并相交的日期吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆