如何计算有条件的连续 pandas 数据框行之间的日差 [英] How to calculate day's difference between successive pandas dataframe rows with condition
问题描述
我有一个如下的熊猫数据框.
I have a pandas dataframe like following..
item_id date
101 2016-01-05
101 2016-01-21
121 2016-01-08
121 2016-01-22
128 2016-01-19
128 2016-02-17
131 2016-01-11
131 2016-01-23
131 2016-01-24
131 2016-02-06
131 2016-02-07
我想计算日期列之间但相对于item_id
列的天差.首先,我想对数据框进行排序,并在item_id上进行日期分组.看起来应该像这样
I want to calculate days difference between date column but with respect to item_id
column. First I want to sort the dataframe with date grouping on item_id. It should look like this
item_id date
101 2016-01-05
101 2016-01-08
121 2016-01-21
121 2016-01-22
128 2016-01-17
128 2016-02-19
131 2016-01-11
131 2016-01-23
131 2016-01-24
131 2016-02-06
131 2016-02-07
然后我要再次计算分组在item_id
上的日期之间的差异,因此输出应如下所示
Then I want to calculate the difference between dates again grouping on item_id
So the output should look like following
item_id date day_difference
101 2016-01-05 0
101 2016-01-08 3
121 2016-01-21 0
121 2016-01-22 1
128 2016-01-17 0
128 2016-02-19 2
131 2016-01-11 0
131 2016-01-23 12
131 2016-01-24 1
131 2016-02-06 13
131 2016-02-07 1
对于排序,我使用了类似的方法
For sorting I used something like this
df.groupby('item_id').apply(lambda x: new_df.sort('date'))
但是,它没有解决.我可以通过以下方式计算连续行之间的差异
But,it didn't work out. I am able to calculate the difference between consecutive rows by following
(df['date'] - df['date'].shift(1))
但不适用于与item_id
推荐答案
我认为您可以使用:
df['date'] = df.groupby('item_id')['date'].apply(lambda x: x.sort_values())
df['diff'] = df.groupby('item_id')['date'].diff() / np.timedelta64(1, 'D')
df['diff'] = df['diff'].fillna(0)
print df
item_id date diff
0 101 2016-01-05 0
1 101 2016-01-21 16
2 121 2016-01-08 0
3 121 2016-01-22 14
4 128 2016-01-19 0
5 128 2016-02-17 29
6 131 2016-01-11 0
7 131 2016-01-23 12
8 131 2016-01-24 1
9 131 2016-02-06 13
10 131 2016-02-07 1
这篇关于如何计算有条件的连续 pandas 数据框行之间的日差的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!