如何为每个岛屿的连续记录编号? [英] How to number consecutive records per island?

查看:80
本文介绍了如何为每个岛屿的连续记录编号?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一张桌子,看起来像:

I have a table which looks like:

group    date        color
  A      1-1-2019      R
  A      1-2-2019      Y
  B      1-1-2019      R
  B      1-2-2019      Y
  B      1-3-2019      Y
  B      1-4-2019      R
  B      1-5-2019      R
  B      1-6-2019      R

按组和日期排序.我想要一个额外的列,以显示每个组连续的连续颜色'R'.

And it's ordered by group and date. I want an extra column showing sequential number of consecutive color 'R' for each group.

必需的输出:

group    date        color    rank
  A      1-1-2019      R      1
  A      1-2-2019      Y      null
  B      1-1-2019      R      1
  B      1-2-2019      Y      null
  B      1-3-2019      Y      null
  B      1-4-2019      R      1
  B      1-5-2019      R      2
  B      1-6-2019      R      3

我尝试将窗口函数用于按组和颜色列进行分区,但是它返回的输出在下面是不正确的.

I've tried to use window function with partition by group and color columns but it returns output below which is not correct.

错误的查询和输出:

SELECT 
    *, 
    RANK() OVER (PARTITION BY group, color order by group, date) as rank
FROM table

group    date        color    rank
  A      1-1-2019      R      1
  A      1-2-2019      Y      null
  B      1-1-2019      R      1
  B      1-2-2019      Y      null
  B      1-3-2019      Y      null
  B      1-4-2019      R      2
  B      1-5-2019      R      3
  B      1-6-2019      R      4

我想知道它是否可以在SQL中使用,还是应该切换到另一种语言(例如Python)?

I'm wondering if it's doable in SQL, or should I switch to another language (like Python)?

推荐答案

这是使用窗口函数可以完成的方法.首先,我们创建一个CTE,该CTE具有一个标志,指示新的序列已开始,然后从中生成一个对序列号进行计数的标志.最后,我们对每个序列中的行进行计数以获得排名:

This is how it can be done using window functions. First we create a CTE which has a flag which indicates that a new sequence has started, then from that we generate one which counts sequence numbers. Finally we count rows within each sequence to get the rank:

WITH cte AS (SELECT `group`, date, color,
                    COALESCE(color = LAG(color) OVER(ORDER BY `group`, date), 0) AS samecolor
             FROM `table`),
sequences AS (SELECT `group`, date, color,
              SUM(samecolor = 0) OVER (ORDER BY `group`, date) AS seq_num
              FROM cte)
SELECT `group`, date, color,
       ROW_NUMBER() OVER (PARTITION BY seq_num) AS `rank`
FROM sequences
ORDER BY `group`, date

输出:

group   date        color   rank
A       1-1-2019    R       1
A       1-2-2019    Y       1
B       1-1-2019    R       1
B       1-2-2019    Y       1
B       1-3-2019    Y       2
B       1-4-2019    R       1
B       1-5-2019    R       2
B       1-6-2019    R       3

在dbfiddle上进行演示

请注意,如果您希望这些值成为NULL,则此查询还会给出Y值的排名,用以下内容替换rank的定义:

Note that this query also gives ranking for Y values, if you want those to be NULL replace the definition of rank with this:

CASE WHEN color = 'Y' THEN NULL
     ELSE ROW_NUMBER() OVER (PARTITION BY seq_num) 
     END AS `rank`

这篇关于如何为每个岛屿的连续记录编号?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆