查找分区中最大的连续数字组 [英] Finding the largest group of consecutive numbers within a partition

查看:89
本文介绍了查找分区中最大的连续数字组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下数据按player_id和match_date排序。我想找出连续运行次数最多的记录组(从2014-04-03到2014-04-12连续3次运行4次)

I have the following data ordered by player_id and match_date. I would like to find out the group of records that has the maximum number of consecutive runs (4 runs from 2014-04-03 till 2014-04-12 for 3 consecutive times)

 player_id  match_date  runs
    1       2014-04-01    5
    1       2014-04-02    55       
    1       2014-04-03    4       
    1       2014-04-10    4       
    1       2014-04-12    4       
    1       2014-04-14    3       
    1       2014-04-19    4       
    1       2014-04-20    44               
    2       2014-04-01    23
    2       2014-04-02    23       
    2       2014-04-03    23       
    2       2014-04-10    23       
    2       2014-04-12    4       
    2       2014-04-14    3       
    2       2014-04-19    23       
    2       2014-04-20    1   

我想出了以下SQL:

select *,row_number() over (partition by ranked.player_id,ranked.runs
order by ranked.match_date) as R from (
select player_id ,match_date,runs from players order by 1,2 desc )
ranked order by ranked.player_id, match_date asc

但是,此继续前一次连续运行的排名(4次在2014年4月19日,玩家1预计会获得等级1,但由于已有3次相同分区出现,因此获得了等级4)。同样,在2014-04-19上,玩家2的23奔跑有望获得等级1,但获得等级5,因为已经有4次出现此玩家23奔跑。

But this continues the rank from the previous consecutive runs (4 runs on 2014-04-19 for Player 1 is expected to get Rank 1 but gets Rank 4 since there were 3 occurrences of the same partition already). Similarly 23 runs for Player 2 on 2014-04-19 is expected to get Rank 1 but gets Rank 5 since there were 4 occurrences of 23 runs already for this player.

当运行值从上一行更改时,如何重置排名至1?

How do I reset the rank back to 1 when the value of runs changes from its previous row?

模式,数据,SQL,其输出在 SQLFiddle

Schema, data, SQL and the output is available on SQLFiddle.

推荐答案

您可以使用窗口函数执行此操作。

You can do this with window functions.

select player_id, runs, count(*) as numruns
from (select p.*,
             (row_number() over (partition by player_id order by match_date) -
              row_number() over (partition by player_id, runs order by match_date)
             ) as grp
      from players p
     ) pg
group by grp, player_id, runs
order by numruns desc
limit 1;

主要观察到的是按顺序运行具有此属性:如果枚举行( (对于每个玩家)(按日期),您可以按日期枚举每个玩家的行数,并按日期枚举运行次数,那么当运行次数相同且顺序相同时,差异是恒定的。这样便形成了一个组,可用于汇总以标识所需的玩家。

The key observation is that "runs in a sequence" have this property: if you enumerate the rows (for each player) by date and you enumerate the rows for each player and by the runs by date, then the difference is constant when the runs are all the same and in order. That forms a group that you can use for aggregation to identify the player you want.

此处是SQL提琴。

这篇关于查找分区中最大的连续数字组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆