在 pandas 中按组获得最长连续周的连续记录 [英] Get longest streak of consecutive weeks by group in pandas

查看:52
本文介绍了在 pandas 中按组获得最长连续周的连续记录的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

目前我正在处理不同主题的每周数据,但它可能有一些没有数据的长时间连续记录,所以,我想做的是为每个 id<保持连续几周的最长连续记录.我的数据如下所示:

Currently I'm working with weekly data for different subjects, but it might have some long streaks without data, so, what I want to do, is to just keep the longest streak of consecutive weeks for every id. My data looks like this:

id    week
1      8
1      15
1      60
1      61
1      62
2      10
2      11
2      12
2      13
2      25
2      26

我的预期输出是:

id    week
1      60
1      61
1      62
2      10
2      11
2      12
2      13

我有点接近,试图在 week==week.shift()+1 时用 1 标记.问题是这种方法不会标记连续出现的第一次,而且我也无法过滤最长的一次:

I got a bit close, trying to mark with a 1 when week==week.shift()+1. The problem is this approach doesn't mark the first occurrence in a streak, and also I can't filter the longest one:

df.loc[ (df['id'] == df['id'].shift())&(df['week'] == df['week'].shift()+1),'streak']=1

根据我的例子,这会带来这个:

This, according to my example, would bring this:

id    week  streak
1      8     nan
1      15    nan
1      60    nan
1      61    1
1      62    1
2      10    nan
2      11    1
2      12    1
2      13    1
2      25    nan
2      26    1

关于如何实现我想要的任何想法?

Any ideas on how to achieve what I want?

推荐答案

试试这个:

df['consec'] = df.groupby(['id',df['week'].diff(-1).ne(-1).shift().bfill().cumsum()]).transform('count')

df[df.groupby('id')['consec'].transform('max') == df.consec]

输出:

   id  week  consec
2   1    60       3
3   1    61       3
4   1    62       3
5   2    10       4
6   2    11       4
7   2    12       4
8   2    13       4

这篇关于在 pandas 中按组获得最长连续周的连续记录的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆