同时聚合重叠范围(矩形问题) [英] Simultaneously Aggregating Overlapping Ranges (A Rectangle Problem)

查看:106
本文介绍了同时聚合重叠范围(矩形问题)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

以两个间隔考虑一组数据.例如,考虑一个学生的课程表.每个记录都有一个开始日期和结束日期,每个类都有一个句点开始时间和一个句点结束时间.但是从某些记录重叠的意义上说,此计划未标准化".因此,如果搜索包含给定日期和时间段的学生记录,则可能会得到多个匹配项.

Consider a set of data with two intervals. For instance, consider a student schedule of classes. Each record has a begin and end date, and each class has a period start time and a period end time. But this schedule is not 'normalized' in the sense that some records overlap. So if you search for records encompassing a given date and period for a student, you might get multiple matches.

这是一个人为的例子.我将日期表示为整数以简化问题:

Here's a contrived example. I represent the dates as integers to simplify the problem:

declare @schedule table (
    student char(3),
    fromDate int,
    toDate int,
    fromPeriod int,
    toPeriod int
)

insert @schedule values
    ('amy', 1, 7, 7, 9),
    ('amy', 3, 9, 5, 8), 
    ('amy', 10, 12, 1, 3), 
    ('ted', 1, 5, 11, 14),
    ('ted', 7, 11, 13, 16); 

艾米的日期和期间范围重叠或相邻.如果我查询日期5期7",我将获得两场比赛.我需要对它们进行重新处理,以便它们代表相同的区域",但不再重叠.

Amy's date and period ranges either overlap or are adjacent. If I queried for 'date 5 period 7', I would get two matches. I need these reworked so that they represent the same 'area' but no longer overlap.

Ted的时间段重叠,但他的日期不重叠.这意味着没有真正的重叠,因此不需要重新做任何事情.

Ted's periods overlap but his dates do not. This means there's no real overlap, so no need to re-work anything.

我阅读了许多有关重叠工作间隔的文章和文章.即:

I've read many posts and some articles on working overlapping intervals. Namely:

  • Merge overlapping date intervals
  • Flattening intersecting timespans
  • Condense Time Periods with SQL
  • SQL Server - cumulative sum on overlapping data - getting date that sum reaches a given value
  • Flatten/merge overlapping time intervals
  • SQL Server separate overlapping dates

我已经从Itzik的一个名为解决方案-装箱日期和时间间隔-难题"的博客中实现了一个方案,该方案对一个特定项目非常有效.我认为这不是一个稳定的链接,但是我找到了它的副本此处.

I've implemented one from Itzik from a blog entitled 'solutions-packing-date-and-time-intervals-puzzle' that has worked great for one particular project. I don't think it's a stable link, but I've found a copy of it here.

但是我很难将这些资源中的知识扩展到眼前的问题上.这可能是我的限制.我在追踪他们时遇到了麻烦.我已经研究过Itzik的解决方案,并且已经了解了很多,但是我确实记得有一件我根本不了解的解决方案.或者可能是那些解决方案仅适用于奇异范围.

But I'm having difficulty extending the knowledge in those resources to my problem at hand. It might be my limitation. I have trouble following them. I've studied Itzik's solution and have come to understand a lot of it, but I do recall there's one piece I just couldn't understand. Or it might be that those solutions only work with singular ranges.

我通过将范围视为文字矩形对象解决了这个问题.有用.我什至在我自己的应用程序中也做了一些性能提高的版本.因此,我将其发布为解决方案,以防遇到相同问题的任何人都可以使用.

I resolved this question by treating the ranges as literal rectangle objects. It works. I've even made a version of it somewhat performant in my own application. So I'll post it as a solution in case it is of use to anyone with the same issue.

但是它是如此之长并且涉及很多,并且有很多奇怪的地方(例如缓冲线,循环形状,使用浮点值,舍入问题),我不禁认为这是一种更好的方法.我列出的资源的概念可以扩展到双重范围吗?还是某些SRID允许切割长度为零的矩形?

But it is so long and involved and there are enough quirks to it (e.g. buffering lines, looping shapes, working with float values, rounding issues) that I can't help but think that there's a much better way. Can the concepts of my listed resources be extended to dual ranges? Or do some SRID's allow cutting rectangles with zero-length lines?

这个问题没有答案,因为您可以汇总范围并以不同的方式对其进行解构.但是,为了最大程度地减少生成的矩形的数量,实际上只有两个可接受的答案.视觉上,在X轴上显示日期,在Y轴上显示句点,重叠的范围可以像这样开始:

There is no one answer to this problem, because you can aggregate ranges and deconstruct them in different ways. But to minimize the number of resulting rectangles, there are really only two acceptable answers. Visually, with dates on the X axis and periods on the Y axis, overlapping ranges can start out like this:

 +------------+
 |            |
 |    +------------+
 |    ||||||||     |  <- 2 overlapping rectangles
 +----|            |
      |            |
      +------------+

我们可以这样修改它:

 +---+ +-----+
 |   | |     |
 |   | |     | +---+  <- 3 non-overlapping 
 |   | |     | |   |     vertically cut rectangles
 +---| |     | |   |
       |     | |   |
       +-----+ +---+

或者这样:

 +-----------+
 +-----------+

 +-----------------+  <- 3 non-overlapping 
 +-----------------+     horizontally cut rectangles

       +-----------+
       +-----------+

进行垂直切割时,结果将如下所示:

Going with vertical cuts, the results would look like this:

+-------------------------------------------+
|student|fromDate|toDate|fromPeriod|toPeriod|
|-------------------------------------------|
|amy    |1       |2     |7         |9       |
|amy    |3       |7     |5         |9       |
|amy    |8       |9     |5         |8       |
|amy    |10      |12    |1         |3       |
|ted    |1       |5     |11        |14      |
|ted    |7       |11    |13        |16      |
+-------------------------------------------+

进行水平切割时,结果将如下所示:

Going with horizontal cuts, the results would look like this:

+-------------------------------------------+
|student|fromDate|toDate|fromPeriod|toPeriod|
|-------------------------------------------|
|amy    |1       |7     |9         |9       |
|amy    |1       |9     |7         |8       |
|amy    |3       |9     |5         |6       |
|amy    |10      |12    |1         |3       |
|ted    |1       |5     |11        |14      |
|ted    |7       |11    |13        |16      |
+-------------------------------------------+

任何一种都可以接受.不过,为了保持确定性和可操作性,您可能希望选择一种策略并坚持下去.

Either is acceptable. Though, to keep it deterministic and tractable, you would want to choose one strategy and stick with it.

推荐答案

这是一个非常有创意的解决方案,并且很有趣!!

that is a very creative solution and an interesting read!!

一种相当简单的方法:

with 

    a as (    

        select student, fromdate from @schedule union
        select student, todate+1 from @schedule    

    ),

    b as (

        select   *, 
                 todate = (
                     select   min(aa.fromdate) 
                     from a as aa 
                     where aa.student = a.student 
                     and aa.fromdate > a.fromdate
                 ) - 1 
        from     a
    )

    select    *
    from      b
    where     exists (
                  select   * 
                  from     @schedule as s 
                  where    s.student = b.student 
                  and      s.fromdate < b.todate 
                  and      s.todate > b.fromdate
              );

这篇关于同时聚合重叠范围(矩形问题)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆