加入重叠的日期范围 [英] Join overlapping date ranges

查看:48
本文介绍了加入重叠的日期范围的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要连接表 A 和表 B 来创建表 C.

I need to join table A and table B to create table C.

表 A 和表 B 存储 ID 的状态标志.状态标志(A_Flag 和 B_Flag)会不时变化,因此一个 ID 可以包含多行,代表 ID 状态的历史记录.特定 ID 的标志可以相互独立地更改,这可能导致表 A 中的一行属于表 B 中的多行,反之亦然.

Table A and Table B store status flags for the IDs. The status flags (A_Flag and B_Flag) can change from time to time, so one ID can contain multiple rows, which represents the history of the ID's statuses. The flags for a particular ID can change independently of each other, which can result in one row in Table A belonging to multiple rows in Table B, and vice versa.

结果表(表 C)需要是一个唯一日期范围列表,涵盖 ID 生命周期内的每个日期 (01/01/2008-18/08/2008),以及每个日期范围的 A_Flag 和 B_Flag 值.

The resulting table (Table C) needs to be a list of unique date ranges covering every date within the IDs life (01/01/2008-18/08/2008), and A_Flag and B_Flag values for each date range.

实际的表包含数百个 ID,每个 ID 在每个表中都有不同的行数.

The actual tables contain hundreds of IDs with each ID having a varying numbers of rows per table.

我可以使用 SQL 和 SAS 工具来实现最终结果.

I have access to SQL and SAS tools to achieve the end result.

Source - Table A
ID  Start           End     A_Flag
1   01/01/2008  23/03/2008  1
1   23/03/2008  15/06/2008  0
1   15/06/2008  18/08/2008  1

Source - Table B
ID  Start           End     B_Flag
1   19/01/2008  17/02/2008  1
1   17/02/2008  15/06/2008  0
1   15/06/2008  18/08/2008  1

Result - Table C
ID  Start           End  A_Flag B_Flag
1   01/01/2008  19/01/2008  1   0
1   19/01/2008  17/02/2008  1   1
1   17/02/2008  23/03/2008  1   0
1   23/03/2008  15/06/2008  0   0
1   15/06/2008  18/08/2008  1   1

推荐答案

您提出的问题可以在一条 SQL 语句中解决,无需非标准扩展.

The problem you posed can be solved in one SQL statement without nonstandard extensions.

要认识到的最重要的事情是开始-结束对中的日期每个都代表一个时间跨度的潜在开始点或结束点,在此期间标志对将为真.一个日期是开始"而另一个日期是结束"实际上并不重要;任何日期都是一个两者的时间分隔符:它结束前一个时期并开始另一个.构造一组最小时间间隔,并将它们连接到表中以查找在每个间隔期间获得的标志.

The most important thing to recognize is that the dates in the begin-end pairs each represent a potential starting or ending point of a time span during which the flag pair will be true. It actually doesn't matter that one date is a "begin" and another and "end"; any date is a time delimiter that does both: it ends a prior period and begins another. Construct a set of minimal time intervals, and join them to the tables to find the flags that obtained during each interval.

我将您的示例(和解决方案)添加到了我的规范 SQL 页面.请参阅那里的详细讨论.公平地说,这是查询本身

I added your example (and a solution) to my Canonical SQL page. See there for a detailed discussion. In fairness to SO, here's the query itself

with D (ID, bound) as (
    select   ID 
       , case T when 's' then StartDate else EndDate end as bound
    from  (
    select ID, StartDate, EndDate from so.A 
    UNION
    select ID, StartDate, EndDate from so.B
    ) as U
    cross join (select 's' as T union select 'e') as T
)
select P.*, a.Flag as A_Flag, b.Flag as B_Flag
from (
    select s.ID, s.bound as StartDate, min(e.bound) as EndDate
    from D as s join D as e 
    on s.ID = e.ID 
    and s.bound < e.bound
    group by s.ID, s.bound
) as P
left join so.A as a
on  P.ID = a.ID 
and a.StartDate <= P.StartDate and P.EndDate <= a.EndDate
left join so.B as b
on  P.ID = b.ID 
and b.StartDate <= P.StartDate and P.EndDate <= b.EndDate
order by P.ID, P.StartDate, P.EndDate

这篇关于加入重叠的日期范围的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆