SQL挑战/拼图:如何合并嵌套范围? [英] SQL Challenge/Puzzle: How to merge nested ranges?

查看:207
本文介绍了SQL挑战/拼图:如何合并嵌套范围?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述


  • 这个挑战是基于涉及IP范围的实际使用情况。

  • 我使用的解决方案基于堆栈跟踪我以前提过的挑战。每个范围起始都被视为一个 PUSH 操作,每个范围结束+ 1被视为一个 POP 操作






挑战



我们有一个范围数据集,每个范围都有一个起点

 创建表格范围

range_start int not null
,range_end int不为空
,range_val char(1)不为空

;

一个范围可以包含另一个范围或者跟随另一个范围,但不能等于另一个范围或与其相交另一个范围。



这些是有效范围之间的关系:

 (1)(2)(3)(4)
--------- --------- --------- - ------- -----------
--- --- ---

这些关系是无效

 (5) (6)
------- --------
------- --------

如果以图形方式显示,我们的初始范围可能如下所示(该字母代表 range_val ):

  AAAAAAAA BBCCCCCCC 
DDE F GGGGG
H IIII
J

目标是获取初始范围并创建一个新集根据以下规则:

包含范围将覆盖包含范围的相应子范围。



所请求的结果如图所示,可能看起来像这样

  ADDHAAAF BIIJIGCCC 



要求




  • 解决方案应该是单个SQL查询(子查询都可以)。

  • 使用T-SQL,PL / SQL等是不允许的
  • 使用UDF用户定义函数)不允许



数据示例



  AAAAAAAAAAAAAAAAAAAAAAAAAAAA BBBB CCCCCCCCCCCCCCCCCCCCCCCCC 
DDDE FFFFFFFF GGGGGGGGG HHHHHHHH IIIIIII
JJ KKKLLL MM NN OOOOO
P QQ

insert (range_start,range_end,range_val)值(1,28,'A');
插入范围(range_start,range_end,range_val)值(31,34,'B');
插入范围(range_start,range_end,range_val)值(39,63,'C');
插入范围(range_start,range_end,range_val)值(1,3,'D');
插入范围(range_start,range_end,range_val)值(4,4,'E');
插入范围(range_start,range_end,range_val)值(7,14,'F');
插入范围(range_start,range_end,range_val)值(19,27,'G');
插入范围(range_start,range_end,range_val)值(43,50,'H');
插入范围(range_start,range_end,range_val)值(55,61,'I');
插入范围(range_start,range_end,range_val)值(1,2,'J');
插入范围(range_start,range_end,range_val)值(9,11,'K');
插入范围(range_start,range_end,range_val)值(12,14,'L');
插入范围(range_start,range_end,range_val)值(22,23,'M');
插入范围(range_start,range_end,range_val)值(25,26,'N');
插入范围(range_start,range_end,range_val)值(57,61,'O');
插入范围(range_start,range_end,range_val)值(13,13,'P');
插入范围(range_start,range_end,range_val)值(60,61,'Q');



请求的结果



这里是空格)

  JJDEAAFFKKKLPLAAAAGGGMMGNNGA BBBB CCCCHHHHHHHHCCCCIIOOOQQCC 

range_start range_end range_val
--- -------- --------- ---------
1 2 J
3 3 D
4 4 E
5 6 A
7 8 F
9 11 K
12 12 L
13 13 P
14 14 L
15 18 A
19 21 G
22 23 M
24 24 G
25 26 N
27 27 G
28 28 A
29 30
31 34 B
35 38
39 42 C
43 50 H
51 54 C
55 56 I
57 59 O
60 61 Q
62 63 C

可选的最后一行:

  64 


解决方案


  • 解决方案基于堆栈跟踪以前提出过的挑战。每个范围的开始都被视为 PUSH 操作,每个范围结束+ 1被视为 POP 操作。 您可能会注意到2个内部分析函数如何使用相同的窗口,因此只需一步即可执行。

     选择new_range_start 
    ,new_range_end

    ,last_value(range_val忽略空值)超过

    分区由stack_depth
    按order by new_range_start,range_start,range_end desc
    rows unbounded前
    )as new_range_val

    from(select new_range_start
    ,range_val
    ,range_start
    ,range_end

    ,sum(如果range_val为null,则为-1,否则为1 end)超过

    order by new_range_start,range_start,range_end desc
    rows unbounded before
    )as stack_depth

    ,min(new_range_start)超过

    由new_range_start命令,range_start,range_end desc
    行在1以下与1以下之间

    ) - 1 as new_range_end
    $ b $ from范围
    从范围
    中选择range_start,range_start,range_end,range_val从范围
    union all选择range_end + 1,range_start,range_end,cast(null作为char(1)) )
    r(new_range_start,range_start,range_end,range_val)

    r

    qualify new_range_end> = new_range_start

    order by new_range_start
    ;



    Oracle



     <$ c_c> select new_range_start 
    ,new_range_end
    ,new_range_val

    from(select new_range_start
    ,new_range_end
    $ b,last_value(range_val忽略空值)超过

    分区由stack_depth
    按顺序由new_range_start,range_start,range_end desc
    行无界前
    )as new_range_val


    from(select new_range_start
    ,range_start
    ,range_end
    ,range_val

    ,sum(如果range_val为null,则为-1,否则为1 end)超过

    order by new_range_start,range_start,range_end desc
    rows unbounded before
    )as stack_depth

    ,lead(new_range_start)超过

    order by new_range_start,range_start,range_end desc
    ) - 1 as new_range_end

    from(select range_start as new_range_start,range_start ,range_end,range_val范围
    union all选择range_end + 1, range_start,range_end,cast(null作为char(1))从范围

    r

    r

    r

    where new_range_end> = new_range_start

    order by new_range_start
    ;



    SQL Server / PostgreSQL / Hive



    <$ p (选择new_range_start
    ,new_range_end
    ,min(range_val)超过

    分区)由$选择
    $ b $ stack_depth,new_range_val_group_id
    )as new_range_val

    from(select new_range_start
    ,new_range_end
    ,range_val
    ,stack_depth

    ,count (range_val)超过

    分区由stack_depth
    按new_range_start排序,range_start,range_en d desc
    rows unbounded前
    )as new_range_val_group_id


    from(select new_range_start
    ,range_start
    ,range_end
    ,range_val

    ,sum(range_val为null时的情况,-1为另一个1的情况)超过

    的顺序by new_range_start,range_start,range_end desc
    行无界前
    )作为堆栈_depth

    ,lead(new_range_start)超过

    order by new_range_start,range_start,range_end desc
    ) - 1 as new_range_end

    from (从范围
    中选择range_start作为new_range_start,range_start,range_end,range_val,range_value作为new_range_start,range_start,range_end,cast(null作为char(1))从范围

    r

    r

    r

    r

    其中new_range_end> = new_range_start

    由new_range_start命令
    ;


    • This challenge is based on a real life use-case involving IP ranges.
    • The solution I came with is based on the stack trace challenge I've presented previously. Each range start is treated as a PUSH operation and each range end + 1 is treated as a POP operation.

    The Challenge

    We have a data set of ranges where each range has a starting point, ending point and a value.

    create table ranges
    (
        range_start     int         not null
       ,range_end       int         not null
       ,range_val       char(1)     not null
    )
    ;
    

    A range can contain another range or follow another range, but cannot be equal to another range or intersect with another range.

    These are valid relations between ranges:

    (1)           (2)           (3)           (4)
    ---------     ---------     ---------     -------- -----------
    ---                 ---        ---
    

    These relations are not valid:

    (5)                (6)
    -------        --------       
    -------              --------
    

    Our initial ranges, when presented graphically, might look something like this (The letter represents range_val):

    AAAAAAAA  BBCCCCCCC
     DDE   F   GGGGG
       H       IIII
                 J
    

    The goal is to take the initial set of ranges and create a new set under the following rule:

    A contained range will override the corresponding sub-range of the the containing range.

    The requested result, when presented graphically, might look something like this

    ADDHAAAF  BIIJIGCCC
    

    Requirements

    • The solution should be a single SQL query (sub-queries are fine).
    • The use of T-SQL, PL/SQL etc. is not allowed.
    • The use of UDF (User Defined Functions) is not allowed.

    Data Sample

    AAAAAAAAAAAAAAAAAAAAAAAAAAAA  BBBB    CCCCCCCCCCCCCCCCCCCCCCCCC
    DDDE  FFFFFFFF    GGGGGGGGG               HHHHHHHH    IIIIIII
    JJ      KKKLLL       MM NN                              OOOOO
                P                                              QQ
    
    insert into ranges (range_start,range_end,range_val) values (1  ,28 ,'A');
    insert into ranges (range_start,range_end,range_val) values (31 ,34 ,'B');
    insert into ranges (range_start,range_end,range_val) values (39 ,63 ,'C');
    insert into ranges (range_start,range_end,range_val) values (1  ,3  ,'D');
    insert into ranges (range_start,range_end,range_val) values (4  ,4  ,'E');
    insert into ranges (range_start,range_end,range_val) values (7  ,14 ,'F');
    insert into ranges (range_start,range_end,range_val) values (19 ,27 ,'G');
    insert into ranges (range_start,range_end,range_val) values (43 ,50 ,'H');
    insert into ranges (range_start,range_end,range_val) values (55 ,61 ,'I');
    insert into ranges (range_start,range_end,range_val) values (1  ,2  ,'J');
    insert into ranges (range_start,range_end,range_val) values (9  ,11 ,'K');
    insert into ranges (range_start,range_end,range_val) values (12 ,14 ,'L');
    insert into ranges (range_start,range_end,range_val) values (22 ,23 ,'M');
    insert into ranges (range_start,range_end,range_val) values (25 ,26 ,'N');
    insert into ranges (range_start,range_end,range_val) values (57 ,61 ,'O');
    insert into ranges (range_start,range_end,range_val) values (13 ,13 ,'P');
    insert into ranges (range_start,range_end,range_val) values (60 ,61 ,'Q');
    

    Requested Results

    (Nulls are presented here as empty spaces)

    JJDEAAFFKKKLPLAAAAGGGMMGNNGA  BBBB    CCCCHHHHHHHHCCCCIIOOOQQCC
    
    range_start range_end range_val
    ----------- --------- ---------
    1           2          J
    3           3          D
    4           4          E
    5           6          A
    7           8          F
    9           11         K
    12          12         L
    13          13         P
    14          14         L
    15          18         A
    19          21         G
    22          23         M
    24          24         G
    25          26         N
    27          27         G
    28          28         A
    29          30         
    31          34         B
    35          38         
    39          42         C
    43          50         H
    51          54         C
    55          56         I
    57          59         O
    60          61         Q
    62          63         C
    

    Optional addition final row:

    64
    

    解决方案

    • The solution is based on the stack trace challenge I've presented previously. Each range start is treated as a PUSH operation and each range end + 1 is treated as a POP operation.
    • Performence wise, you may notice how the 2 internal analytic functions use the same windowing, therefore being executed in a single step.

    Teradata

    select      new_range_start
               ,new_range_end
    
               ,last_value (range_val ignore nulls) over 
                (
                    partition by    stack_depth
                    order by        new_range_start ,range_start ,range_end desc 
                    rows            unbounded preceding
                )                                                                   as new_range_val
    
    from       (select      new_range_start
                           ,range_val
                           ,range_start
                           ,range_end
    
                           ,sum (case when range_val is null then -1 else 1 end) over 
                            (
                                order by    new_range_start, range_start ,range_end desc  
                                rows        unbounded preceding
                            )                                                                   as stack_depth
    
                           ,min (new_range_start) over
                            (
                                order by    new_range_start ,range_start ,range_end desc
                                rows        between 1 following and 1 following
    
                            ) - 1                                                               as new_range_end
    
                from        (           select range_start     ,range_start ,range_end ,range_val              from ranges
                            union all   select range_end   + 1 ,range_start ,range_end ,cast (null as char(1)) from ranges
                            )
                            r (new_range_start,range_start,range_end,range_val)
                )
                r
    
    qualify     new_range_end >= new_range_start
    
    order by    new_range_start
    ;
    

    Oracle

    select      new_range_start
               ,new_range_end
               ,new_range_val                       
    
    from       (select      new_range_start
                           ,new_range_end
    
                           ,last_value (range_val ignore nulls) over 
                            (
                                partition by    stack_depth
                                order by        new_range_start ,range_start ,range_end desc 
                                rows            unbounded preceding
                            )                                                                   as new_range_val
    
    
                from       (select      new_range_start
                                       ,range_start
                                       ,range_end
                                       ,range_val
    
                                       ,sum (case when range_val is null then -1 else 1 end) over 
                                        (
                                            order by    new_range_start, range_start ,range_end desc  
                                            rows        unbounded preceding
                                        )                                                                as stack_depth
    
                                       ,lead (new_range_start) over
                                        (
                                            order by    new_range_start, range_start ,range_end desc 
                                        ) - 1                                                            as new_range_end
    
                            from        (           select range_start     as new_range_start ,range_start ,range_end ,range_val              from ranges
                                        union all   select range_end   + 1                    ,range_start ,range_end ,cast (null as char(1)) from ranges
                                        )
                                        r 
                            )
                            r
                )
                r
    
    where       new_range_end >= new_range_start
    
    order by    new_range_start
    ;
    

    SQL Server / PostgreSQL / Hive

    select      *
    
    from       (select      new_range_start
                           ,new_range_end
                           ,min (range_val) over
                            (
                                partition by    stack_depth,new_range_val_group_id
                            )                                                       as new_range_val                       
    
                from       (select      new_range_start
                                       ,new_range_end
                                       ,range_val
                                       ,stack_depth
    
                                       ,count (range_val) over 
                                        (
                                            partition by    stack_depth
                                            order by        new_range_start ,range_start ,range_end desc 
                                            rows            unbounded preceding
                                        )                                                                   as new_range_val_group_id
    
    
                            from       (select      new_range_start
                                                   ,range_start
                                                   ,range_end
                                                   ,range_val
    
                                                   ,sum (case when range_val is null then -1 else 1 end) over 
                                                    (
                                                        order by    new_range_start, range_start ,range_end desc  
                                                        rows        unbounded preceding
                                                    )                                                                as stack_depth
    
                                                   ,lead (new_range_start) over
                                                    (
                                                        order by    new_range_start, range_start ,range_end desc 
                                                    ) - 1                                                            as new_range_end
    
                                        from        (           select range_start     as new_range_start ,range_start ,range_end ,range_val                           from ranges
                                                    union all   select range_end   + 1 as new_range_start ,range_start ,range_end ,cast (null as char(1)) as range_val from ranges
                                                    )
                                                    r 
                                        )
                                        r
                            )
                            r
                )
                r
    
    where       new_range_end >= new_range_start
    
    order by    new_range_start
    ;
    

    这篇关于SQL挑战/拼图:如何合并嵌套范围?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆