如何为每个岛屿的连续记录编号? [英] How to number consecutive records per island?
问题描述
我有一张桌子,看起来像:
I have a table which looks like:
group date color
A 1-1-2019 R
A 1-2-2019 Y
B 1-1-2019 R
B 1-2-2019 Y
B 1-3-2019 Y
B 1-4-2019 R
B 1-5-2019 R
B 1-6-2019 R
按组和日期排序.我想要一个额外的列,以显示每个组连续的连续颜色'R'.
And it's ordered by group and date. I want an extra column showing sequential number of consecutive color 'R' for each group.
必需的输出:
group date color rank
A 1-1-2019 R 1
A 1-2-2019 Y null
B 1-1-2019 R 1
B 1-2-2019 Y null
B 1-3-2019 Y null
B 1-4-2019 R 1
B 1-5-2019 R 2
B 1-6-2019 R 3
我尝试将窗口函数用于按组和颜色列进行分区,但是它返回的输出在下面是不正确的.
I've tried to use window function with partition by group and color columns but it returns output below which is not correct.
错误的查询和输出:
SELECT
*,
RANK() OVER (PARTITION BY group, color order by group, date) as rank
FROM table
group date color rank
A 1-1-2019 R 1
A 1-2-2019 Y null
B 1-1-2019 R 1
B 1-2-2019 Y null
B 1-3-2019 Y null
B 1-4-2019 R 2
B 1-5-2019 R 3
B 1-6-2019 R 4
我想知道它是否可以在SQL中使用,还是应该切换到另一种语言(例如Python)?
I'm wondering if it's doable in SQL, or should I switch to another language (like Python)?
推荐答案
这是使用窗口函数可以完成的方法.首先,我们创建一个CTE,该CTE具有一个标志,指示新的序列已开始,然后从中生成一个对序列号进行计数的标志.最后,我们对每个序列中的行进行计数以获得排名:
This is how it can be done using window functions. First we create a CTE which has a flag which indicates that a new sequence has started, then from that we generate one which counts sequence numbers. Finally we count rows within each sequence to get the rank:
WITH cte AS (SELECT `group`, date, color,
COALESCE(color = LAG(color) OVER(ORDER BY `group`, date), 0) AS samecolor
FROM `table`),
sequences AS (SELECT `group`, date, color,
SUM(samecolor = 0) OVER (ORDER BY `group`, date) AS seq_num
FROM cte)
SELECT `group`, date, color,
ROW_NUMBER() OVER (PARTITION BY seq_num) AS `rank`
FROM sequences
ORDER BY `group`, date
输出:
group date color rank
A 1-1-2019 R 1
A 1-2-2019 Y 1
B 1-1-2019 R 1
B 1-2-2019 Y 1
B 1-3-2019 Y 2
B 1-4-2019 R 1
B 1-5-2019 R 2
B 1-6-2019 R 3
请注意,如果您希望这些值成为NULL
,则此查询还会给出Y
值的排名,用以下内容替换rank
的定义:
Note that this query also gives ranking for Y
values, if you want those to be NULL
replace the definition of rank
with this:
CASE WHEN color = 'Y' THEN NULL
ELSE ROW_NUMBER() OVER (PARTITION BY seq_num)
END AS `rank`
这篇关于如何为每个岛屿的连续记录编号?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!