查找自上次事件 pandas 数据框以来的天数 [英] Find days since last event pandas dataframe
问题描述
我有一个熊猫数据框:
df12 = pd.DataFrame({'group_ids':[1,1,1,2,2,2],'dates':['2016-04-01','2016-04-20','2016-04-28','2016-04-05','2016-04-20','2016-04-29'],'event_today_in_group':[1,0,1,1,1,0]})
group_ids dates event_today_in_group
0 1 2016-04-01 1
1 1 2016-04-20 0
2 1 2016-04-28 1
3 2 2016-04-05 1
4 2 2016-04-20 1
5 2 2016-04-29 0
我想计算一个额外的列,该列针对每个group_ids包含自上一次event_today_in_group为1以来的天数.
I would like to compute an additional column that contains, for each group_ids, the number of days since the last time event_today_in_group was 1.
group_ids dates event_today_in_group days_since_last_event
0 1 2016-04-01 1 0
1 1 2016-04-20 0 19
2 1 2016-04-28 1 27
3 2 2016-04-05 1 0
4 2 2016-04-20 1 15
5 2 2016-04-29 0 9
推荐答案
正如我前面提到的,这将使您获得每个组中日期之间的非累积差异:
As I mentioned earlier, this will get you the non-cumulative difference between dates within each group:
df['days_since_last_event'] = df.groupby('group_ids')['dates'].diff().apply(lambda x: x.days)
为了获得此差异的累积和,基于event_today_in_group
的变化,我建议使用shift
来获取上一行的值,然后生成一个累积和,如下所示:
In order to get a cumulative sum of this difference, based on whenever event_today_in_group
changes, I propose using shift
to get the value of the previous row, and then generating a cumulative sum, like so:
df['event_today_in_group'].shift().cumsum()
输出:
0 NaN
1 1.0
2 1.0
3 2.0
4 3.0
5 4.0
这为我们提供了获得累积总和所需的第二个分组值.您可以将上述值分配给新列,但是如果仅将它们用于计算,则可以将它们简单地包含在随后的groupby
操作中,如下所示:
This gives us the second grouping value we need to get the cumulative sums. You could assign the above values to a new column, but if you're only using them for the calculation, then you can simply include them in the subsequent groupby
operation like so:
df.loc[:, 'days_since_last_event'] = df.groupby(['group_ids', df['event_today_in_group'].shift().cumsum()])['days_since_last_event'].cumsum()
结果:
group_ids dates event_today_in_group days_since_last_event
0 1 2016-04-01 1 NaN
1 1 2016-04-20 0 19.0
2 1 2016-04-28 1 27.0
3 2 2016-04-05 1 NaN
4 2 2016-04-20 1 15.0
5 2 2016-04-29 0 9.0
这篇关于查找自上次事件 pandas 数据框以来的天数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!