SQL挑战/拼图:如何合并嵌套范围? [英] SQL Challenge/Puzzle: How to merge nested ranges?
问题描述
- 这个挑战是基于涉及IP范围的实际使用情况。
- 我使用的解决方案基于堆栈跟踪我以前提过的挑战。每个范围起始都被视为一个 PUSH 操作,每个范围结束+ 1被视为一个 POP 操作 。
挑战
我们有一个范围数据集,每个范围都有一个起点
创建表格范围
(
range_start int not null
,range_end int不为空
,range_val char(1)不为空
)
;
一个范围可以包含另一个范围或者跟随另一个范围,但不能等于另一个范围或与其相交另一个范围。
这些是有效范围之间的关系:
(1)(2)(3)(4)
--------- --------- --------- - ------- -----------
--- --- ---
这些关系是无效:
(5) (6)
------- --------
------- --------
如果以图形方式显示,我们的初始范围可能如下所示(该字母代表 range_val ):
AAAAAAAA BBCCCCCCC
DDE F GGGGG
H IIII
J
目标是获取初始范围并创建一个新集根据以下规则:
包含范围将覆盖包含范围的相应子范围。
所请求的结果如图所示,可能看起来像这样
ADDHAAAF BIIJIGCCC
要求
- 解决方案应该是单个SQL查询(子查询都可以)。
- 使用T-SQL,PL / SQL等是不允许的。
- 使用UDF用户定义函数)不允许。
数据示例
AAAAAAAAAAAAAAAAAAAAAAAAAAAA BBBB CCCCCCCCCCCCCCCCCCCCCCCCC
DDDE FFFFFFFF GGGGGGGGG HHHHHHHH IIIIIII
JJ KKKLLL MM NN OOOOO
P QQ
insert (range_start,range_end,range_val)值(1,28,'A');
插入范围(range_start,range_end,range_val)值(31,34,'B');
插入范围(range_start,range_end,range_val)值(39,63,'C');
插入范围(range_start,range_end,range_val)值(1,3,'D');
插入范围(range_start,range_end,range_val)值(4,4,'E');
插入范围(range_start,range_end,range_val)值(7,14,'F');
插入范围(range_start,range_end,range_val)值(19,27,'G');
插入范围(range_start,range_end,range_val)值(43,50,'H');
插入范围(range_start,range_end,range_val)值(55,61,'I');
插入范围(range_start,range_end,range_val)值(1,2,'J');
插入范围(range_start,range_end,range_val)值(9,11,'K');
插入范围(range_start,range_end,range_val)值(12,14,'L');
插入范围(range_start,range_end,range_val)值(22,23,'M');
插入范围(range_start,range_end,range_val)值(25,26,'N');
插入范围(range_start,range_end,range_val)值(57,61,'O');
插入范围(range_start,range_end,range_val)值(13,13,'P');
插入范围(range_start,range_end,range_val)值(60,61,'Q');
请求的结果
这里是空格)
JJDEAAFFKKKLPLAAAAGGGMMGNNGA BBBB CCCCHHHHHHHHCCCCIIOOOQQCC
range_start range_end range_val
--- -------- --------- ---------
1 2 J
3 3 D
4 4 E
5 6 A
7 8 F
9 11 K
12 12 L
13 13 P
14 14 L
15 18 A
19 21 G
22 23 M
24 24 G
25 26 N
27 27 G
28 28 A
29 30
31 34 B
35 38
39 42 C
43 50 H
51 54 C
55 56 I
57 59 O
60 61 Q
62 63 C
可选的最后一行:
64
- 解决方案基于堆栈跟踪以前提出过的挑战。每个范围的开始都被视为 PUSH 操作,每个范围结束+ 1被视为 POP 操作。 您可能会注意到2个内部分析函数如何使用相同的窗口,因此只需一步即可执行。
选择new_range_start
,new_range_end
,last_value(range_val忽略空值)超过
(
分区由stack_depth
按order by new_range_start,range_start,range_end desc
rows unbounded前
)as new_range_val
from(select new_range_start
,range_val
,range_start
,range_end
,sum(如果range_val为null,则为-1,否则为1 end)超过
(
order by new_range_start,range_start,range_end desc
rows unbounded before
)as stack_depth
,min(new_range_start)超过
(
由new_range_start命令,range_start,range_end desc
行在1以下与1以下之间
) - 1 as new_range_end
$ b $ from范围
从范围
中选择range_start,range_start,range_end,range_val从范围
union all选择range_end + 1,range_start,range_end,cast(null作为char(1)) )
r(new_range_start,range_start,range_end,range_val)
)
r
qualify new_range_end> = new_range_start
order by new_range_start
;
Oracle
<$ c_c> select new_range_start
,new_range_end
,new_range_val
from(select new_range_start
,new_range_end
$ b,last_value(range_val忽略空值)超过
(
分区由stack_depth
按顺序由new_range_start,range_start,range_end desc
行无界前
)as new_range_val
from(select new_range_start
,range_start
,range_end
,range_val
,sum(如果range_val为null,则为-1,否则为1 end)超过
(
order by new_range_start,range_start,range_end desc
rows unbounded before
)as stack_depth
,lead(new_range_start)超过
(
order by new_range_start,range_start,range_end desc
) - 1 as new_range_end
from(select range_start as new_range_start,range_start ,range_end,range_val范围
union all选择range_end + 1, range_start,range_end,cast(null作为char(1))从范围
)
r
)
r
)
r
where new_range_end> = new_range_start
order by new_range_start
;
SQL Server / PostgreSQL / Hive
<$ p (选择new_range_start
,new_range_end
,min(range_val)超过
(
分区)由$选择
$ b $ stack_depth,new_range_val_group_id
)as new_range_val
from(select new_range_start
,new_range_end
,range_val
,stack_depth
,count (range_val)超过
(
分区由stack_depth
按new_range_start排序,range_start,range_en d desc
rows unbounded前
)as new_range_val_group_id
from(select new_range_start
,range_start
,range_end
,range_val
,sum(range_val为null时的情况,-1为另一个1的情况)超过
(
的顺序by new_range_start,range_start,range_end desc
行无界前
)作为堆栈_depth
,lead(new_range_start)超过
(
order by new_range_start,range_start,range_end desc
) - 1 as new_range_end
from (从范围
中选择range_start作为new_range_start,range_start,range_end,range_val,range_value作为new_range_start,range_start,range_end,cast(null作为char(1))从范围
)
r
)
r
)
r
)
r
其中new_range_end> = new_range_start
由new_range_start命令
;
- This challenge is based on a real life use-case involving IP ranges.
- The solution I came with is based on the stack trace challenge I've presented previously. Each range start is treated as a PUSH operation and each range end + 1 is treated as a POP operation.
The Challenge
We have a data set of ranges where each range has a starting point, ending point and a value.
create table ranges ( range_start int not null ,range_end int not null ,range_val char(1) not null ) ;
A range can contain another range or follow another range, but cannot be equal to another range or intersect with another range.
These are valid relations between ranges:
(1) (2) (3) (4) --------- --------- --------- -------- ----------- --- --- ---
These relations are not valid:
(5) (6) ------- -------- ------- --------
Our initial ranges, when presented graphically, might look something like this (The letter represents range_val):
AAAAAAAA BBCCCCCCC DDE F GGGGG H IIII J
The goal is to take the initial set of ranges and create a new set under the following rule:
A contained range will override the corresponding sub-range of the the containing range.
The requested result, when presented graphically, might look something like this
ADDHAAAF BIIJIGCCC
Requirements
- The solution should be a single SQL query (sub-queries are fine).
- The use of T-SQL, PL/SQL etc. is not allowed.
- The use of UDF (User Defined Functions) is not allowed.
Data Sample
AAAAAAAAAAAAAAAAAAAAAAAAAAAA BBBB CCCCCCCCCCCCCCCCCCCCCCCCC DDDE FFFFFFFF GGGGGGGGG HHHHHHHH IIIIIII JJ KKKLLL MM NN OOOOO P QQ insert into ranges (range_start,range_end,range_val) values (1 ,28 ,'A'); insert into ranges (range_start,range_end,range_val) values (31 ,34 ,'B'); insert into ranges (range_start,range_end,range_val) values (39 ,63 ,'C'); insert into ranges (range_start,range_end,range_val) values (1 ,3 ,'D'); insert into ranges (range_start,range_end,range_val) values (4 ,4 ,'E'); insert into ranges (range_start,range_end,range_val) values (7 ,14 ,'F'); insert into ranges (range_start,range_end,range_val) values (19 ,27 ,'G'); insert into ranges (range_start,range_end,range_val) values (43 ,50 ,'H'); insert into ranges (range_start,range_end,range_val) values (55 ,61 ,'I'); insert into ranges (range_start,range_end,range_val) values (1 ,2 ,'J'); insert into ranges (range_start,range_end,range_val) values (9 ,11 ,'K'); insert into ranges (range_start,range_end,range_val) values (12 ,14 ,'L'); insert into ranges (range_start,range_end,range_val) values (22 ,23 ,'M'); insert into ranges (range_start,range_end,range_val) values (25 ,26 ,'N'); insert into ranges (range_start,range_end,range_val) values (57 ,61 ,'O'); insert into ranges (range_start,range_end,range_val) values (13 ,13 ,'P'); insert into ranges (range_start,range_end,range_val) values (60 ,61 ,'Q');
Requested Results
(Nulls are presented here as empty spaces)
JJDEAAFFKKKLPLAAAAGGGMMGNNGA BBBB CCCCHHHHHHHHCCCCIIOOOQQCC range_start range_end range_val ----------- --------- --------- 1 2 J 3 3 D 4 4 E 5 6 A 7 8 F 9 11 K 12 12 L 13 13 P 14 14 L 15 18 A 19 21 G 22 23 M 24 24 G 25 26 N 27 27 G 28 28 A 29 30 31 34 B 35 38 39 42 C 43 50 H 51 54 C 55 56 I 57 59 O 60 61 Q 62 63 C
Optional addition final row:
64
解决方案- The solution is based on the stack trace challenge I've presented previously. Each range start is treated as a PUSH operation and each range end + 1 is treated as a POP operation.
- Performence wise, you may notice how the 2 internal analytic functions use the same windowing, therefore being executed in a single step.
Teradata
select new_range_start ,new_range_end ,last_value (range_val ignore nulls) over ( partition by stack_depth order by new_range_start ,range_start ,range_end desc rows unbounded preceding ) as new_range_val from (select new_range_start ,range_val ,range_start ,range_end ,sum (case when range_val is null then -1 else 1 end) over ( order by new_range_start, range_start ,range_end desc rows unbounded preceding ) as stack_depth ,min (new_range_start) over ( order by new_range_start ,range_start ,range_end desc rows between 1 following and 1 following ) - 1 as new_range_end from ( select range_start ,range_start ,range_end ,range_val from ranges union all select range_end + 1 ,range_start ,range_end ,cast (null as char(1)) from ranges ) r (new_range_start,range_start,range_end,range_val) ) r qualify new_range_end >= new_range_start order by new_range_start ;
Oracle
select new_range_start ,new_range_end ,new_range_val from (select new_range_start ,new_range_end ,last_value (range_val ignore nulls) over ( partition by stack_depth order by new_range_start ,range_start ,range_end desc rows unbounded preceding ) as new_range_val from (select new_range_start ,range_start ,range_end ,range_val ,sum (case when range_val is null then -1 else 1 end) over ( order by new_range_start, range_start ,range_end desc rows unbounded preceding ) as stack_depth ,lead (new_range_start) over ( order by new_range_start, range_start ,range_end desc ) - 1 as new_range_end from ( select range_start as new_range_start ,range_start ,range_end ,range_val from ranges union all select range_end + 1 ,range_start ,range_end ,cast (null as char(1)) from ranges ) r ) r ) r where new_range_end >= new_range_start order by new_range_start ;
SQL Server / PostgreSQL / Hive
select * from (select new_range_start ,new_range_end ,min (range_val) over ( partition by stack_depth,new_range_val_group_id ) as new_range_val from (select new_range_start ,new_range_end ,range_val ,stack_depth ,count (range_val) over ( partition by stack_depth order by new_range_start ,range_start ,range_end desc rows unbounded preceding ) as new_range_val_group_id from (select new_range_start ,range_start ,range_end ,range_val ,sum (case when range_val is null then -1 else 1 end) over ( order by new_range_start, range_start ,range_end desc rows unbounded preceding ) as stack_depth ,lead (new_range_start) over ( order by new_range_start, range_start ,range_end desc ) - 1 as new_range_end from ( select range_start as new_range_start ,range_start ,range_end ,range_val from ranges union all select range_end + 1 as new_range_start ,range_start ,range_end ,cast (null as char(1)) as range_val from ranges ) r ) r ) r ) r where new_range_end >= new_range_start order by new_range_start ;
这篇关于SQL挑战/拼图:如何合并嵌套范围?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!