pandas 分组中连续日期之间的差异 [英] Difference between consecutive dates in pandas groupby
本文介绍了 pandas 分组中连续日期之间的差异的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一个如下数据框:
df_raw_dates = pd.DataFrame({"id": [102, 102, 102, 103, 103, 103, 104], "val": [9,2,4,7,6,3,2], "dates": [pd.Timestamp(2002, 1, 1), pd.Timestamp(2002, 3, 3), pd.Timestamp(2003, 4, 4), pd.Timestamp(2003, 8, 9), pd.Timestamp(2005, 2, 3), pd.Timestamp(2005, 2, 8), pd.Timestamp(2005, 2, 3)]})
id val dates
0 102 9 2002-01-01
1 102 2 2002-03-03
2 102 4 2003-04-04
3 103 7 2003-08-09
4 103 6 2005-02-03
5 103 3 2005-02-08
6 104 2 2005-02-03
我要实现的是,而不是使 dates
列具有 diff_dates
列,该列将表示每个id的连续日期之间的差,其中每个id的第一个条目 diff_dates
列中的 id
将为 0
.如此说来,结果数据帧应该是:
What I want to achieve is instead of having the dates
column to have a column diff_dates
that will represent the difference between consecutive dates per id where the first entry for each id
in the diff_dates
column will be 0
. With that said, the resulting data frame should be:
df_processed_dates = pd.DataFrame({"id": [102, 102, 102, 103, 103, 103, 104], "val": [9,2,4,7,6,3,2], "diff_dates": [0, 61, 397, 0, 544, 5, 0]})
id val diff_dates
0 102 9 0
1 102 2 61
2 102 4 397
3 103 7 0
4 103 6 544
5 103 3 5
6 104 2 0
期待您的回答!
推荐答案
df_raw_dates.groupby('id').dates.diff().dt.days.fillna(0, downcast='infer')
0 0
1 61
2 397
3 0
4 544
5 5
6 0
Name: dates, dtype: int64
要将其重新分配为新列,请执行
To assign this back as a new column, do
df_raw_dates['date_diff'] = (
df_raw_dates
.pop('dates')
.groupby(df_raw_dates['id'])
.diff()
.dt.days
.fillna(0, downcast='infer'))
df_raw_dates
id val date_diff
0 102 9 0
1 102 2 61
2 102 4 397
3 103 7 0
4 103 6 544
5 103 3 5
6 104 2 0
这篇关于 pandas 分组中连续日期之间的差异的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文