按组计算连续的重复值 [英] Count consecutive duplicate values by group

查看:55
本文介绍了按组计算连续的重复值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经在站点上进行了一些搜索,以找到针对该问题的解决方案,但未能找到与我所寻找的恰好相符的答案.我试图计算每个ID号的连续重复值,并按日期排序.我当前的表格类似于下表的前3列,而我要添加的内容则是第四列.

I have searched the site a bit for a solution to this question but have been unable to find an answer that fits precisely what I am looking for. I am attempting to count consecutive duplicate values for each ID number, ordered by the date. My current table resembles the first 3 columns of the table below, whereas the fourth column in what I would like to add.

ID  |  date  | value  | consec_duplicates
1      1/1       1            0
1      1/2       2            0
1      1/3       2            1
1      1/4       2            2
1      1/5       3            0
1      1/6       3            1
2      1/14      1            0
2      1/15      2            0
2      1/16      3            0
2      1/17      3            1
2      1/18      4            0
2      1/19      5            0
3      1/4       1            0
3      1/5       2            0
3      1/6       2            1
3      1/7       2            2
3      1/8       2            3
3      1/9       3            0

有人对如何构建第四栏有任何想法吗?谢谢!

Does anybody have any ideas of how to construct this fourth column? Thanks!

推荐答案

这是一个空白问题.一种方法是使用 row_number() s来区别组.

This is a gap-and-islands problem. One method is the difference of row_number()s to identify the groups.

select t.*,
       dense_rank() over (partition by id order by (seqnum - seqnum_value), value) as grp,
       row_number() over (partition by id, (seqnum - seqnum_value), value order by date) as grp_seqnum
from (select t.*,
             row_number() over (partition by id order by date) as seqnum,
             row_number() over (partition by id, value order by date) as seqnum_v
      from t
     ) t;

第一次看到它很难理解.如果您运行子查询并凝视结果足够长的时间,那么您将了解为什么相邻值的差异是恒定的.

This is a bit tricky to understand the first time you see it. If you run the subquery and stare at the results long enough, you'll get why the difference is constant for adjacent values.

我认为豪尔赫是正确的.您的数据没有重复的相同值,因此您可以执行以下操作:

I think Jorge is right. Your data doesn't have the same value repeated, so you can just do:

select t.*,
       row_number() over (partition by id, value order by date) as grp_seqnum
from t;

这篇关于按组计算连续的重复值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆