将连续的日期有效期合并在一起 [英] Joining together consecutive date validity intervals

查看:114
本文介绍了将连续的日期有效期合并在一起的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一系列记录,其中包含一些具有时间有效性的信息(产品类型).

I have a series of records containing some information (product type) with temporal validity.

如果分组信息(产品类型)保持不变,我想将相邻的有效期合并在一起.我不能将简单的GROUP BYMINMAX一起使用,因为某些产品类型(在示例中为A)可以消失"并返回".

I would like to meld together adjacent validity intervals, provided that the grouping information (the product type) stays the same. I cannot use a simple GROUP BY with MIN and MAX, because some product types (A, in the example) can "go away" and "come back".

使用Oracle 11g.

Using Oracle 11g.

MySQL的类似问题是:如何在MySQL中进行连续分组?

A similar question for MySQL is: How can I do a contiguous group by in MySQL?

输入数据 :

Input data:

| PRODUCT |                       START_DATE |                         END_DATE |
|---------|----------------------------------|----------------------------------|
|       A |      July, 01 2013 00:00:00+0000 |      July, 31 2013 00:00:00+0000 |
|       A |    August, 01 2013 00:00:00+0000 |    August, 31 2013 00:00:00+0000 |
|       A | September, 01 2013 00:00:00+0000 | September, 30 2013 00:00:00+0000 |
|       B |   October, 01 2013 00:00:00+0000 |   October, 31 2013 00:00:00+0000 |
|       B |  November, 01 2013 00:00:00+0000 |  November, 30 2013 00:00:00+0000 |
|       A |  December, 01 2013 00:00:00+0000 |  December, 31 2013 00:00:00+0000 |
|       A |   January, 01 2014 00:00:00+0000 |   January, 31 2014 00:00:00+0000 |
|       A |  February, 01 2014 00:00:00+0000 |  February, 28 2014 00:00:00+0000 |
|       A |     March, 01 2014 00:00:00+0000 |     March, 31 2014 00:00:00+0000 |

预期结果 :

Expected results:

| PRODUCT |                      START_DATE |                         END_DATE |
|---------|---------------------------------|----------------------------------|
|       A |     July, 01 2013 00:00:00+0000 | September, 30 2013 00:00:00+0000 |
|       B |  October, 01 2013 00:00:00+0000 |  November, 30 2013 00:00:00+0000 |
|       A | December, 01 2013 00:00:00+0000 |     March, 31 2014 00:00:00+0000 |

请参阅完整的 SQL小提琴.

推荐答案

这是一个孤岛问题.有多种方法可以解决此问题.它使用leadlag解析函数:

This is a gaps-and-islands problem. There are various ways to approach it; this uses lead and lag analytic functions:

select distinct product,
  case when start_date is null then lag(start_date)
    over (partition by product order by rn) else start_date end as start_date,
  case when end_date is null then lead(end_date)
    over (partition by product order by rn) else end_date end as end_date
from (
  select product, start_date, end_date, rn
  from (
    select t.product,
      case when lag(end_date)
          over (partition by product order by start_date) is null
        or lag(end_date)
          over (partition by product order by start_date) != start_date - 1
        then start_date end as start_date,
      case when lead(start_date)
          over (partition by product order by start_date) is null
        or lead(start_date)
          over (partition by product order by start_date) != end_date + 1
        then end_date end as end_date,
      row_number() over (partition by product order by start_date) as rn
    from t
  )
  where start_date is not null or end_date is not null
)
order by start_date, product;

PRODUCT START_DATE END_DATE
------- ---------- ---------
A       01-JUL-13  30-SEP-13 
B       01-OCT-13  30-NOV-13 
A       01-DEC-13  31-MAR-14 

SQL小提琴

最里面的查询查看产品的前后记录,并且仅在记录不连续时保留开始和/或结束时间:

The innermost query looks at the preceding and following records for the product, and only retains the start and/or end time if the records are not contiguous:

select t.product,
  case when lag(end_date)
      over (partition by product order by start_date) is null
    or lag(end_date)
      over (partition by product order by start_date) != start_date - 1
    then start_date end as start_date,
  case when lead(start_date)
      over (partition by product order by start_date) is null
    or lead(start_date)
      over (partition by product order by start_date) != end_date + 1
    then end_date end as end_date
from t;

PRODUCT START_DATE END_DATE
------- ---------- ---------
A       01-JUL-13            
A                            
A                  30-SEP-13 
A       01-DEC-13            
A                            
A                            
A                  31-MAR-14 
B       01-OCT-13            
B                  30-NOV-13 

下一级别的select删除那些中期查询(内部日期将两个日期都遮住了)的那些查询,

The next level of select removes those which are mid-period, where both dates were blanked by the inner query, which gives:

PRODUCT START_DATE END_DATE
------- ---------- ---------
A       01-JUL-13            
A                  30-SEP-13 
A       01-DEC-13            
A                  31-MAR-14 
B       01-OCT-13            
B                  30-NOV-13 

外部查询然后折叠那些相邻的对;我使用了创建重复项然后使用distinct消除重复项的简单方法,但是您可以通过其他方式进行操作,例如将两个值都放入一对行中,并将两个值都保留为另一个null,然后消除那些拥有另一层选择权的人,但我认为在这里完全可以.

The outer query then collapses those adjacent pairs; I've used the easy route of creating duplicates and then eliminating them with distinct, but you can do it other ways, like putting both values into one of the pairs of rows and leaving both values in the other null, and then eliminating those with another layer of select, but I think distinct is OK here.

如果您的实际用例中有时间,而不仅仅是日期,那么您将需要在内部查询中调整比较;而不是+/- 1,可能是1秒的间隔,或者如果您愿意,则是1/86400,但这取决于您的值的精度.

If your real-world use case has times, not just dates, then you'll need to adjust the comparison in the inner query; rather than +/- 1, an interval of 1 second perhaps, or 1/86400 if you prefer, but depends on the precision of your values.

这篇关于将连续的日期有效期合并在一起的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆