如何使用 pandas 按组计算时差? [英] How to calculate time difference by group using pandas?
本文介绍了如何使用 pandas 按组计算时差?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我想按组计算diff
.而且我不知道如何对time
列进行排序,以使每个组的结果都被排序且为正数.
I want to calculate diff
by group. And I don’t know how to sort the time
column so that each group results are sorted and positive.
原始数据:
In [37]: df
Out[37]:
id time
0 A 2016-11-25 16:32:17
1 A 2016-11-25 16:36:04
2 A 2016-11-25 16:35:29
3 B 2016-11-25 16:35:24
4 B 2016-11-25 16:35:46
我想要的结果
Out[40]:
id time
0 A 00:35
1 A 03:12
2 B 00:22
注意:时间col的类型是timedelta64 [ns]
notice: the type of time col is timedelta64[ns]
In [38]: df['time'].diff(1)
Out[38]:
0 NaT
1 00:03:47
2 -1 days +23:59:25
3 -1 days +23:59:55
4 00:00:22
Name: time, dtype: timedelta64[ns]
没有得到想要的结果.
不仅解决了问题,而且因为有5000万行,所以代码可以快速运行.
Not only solve the problem but the code can run fast because there are 50 million rows.
推荐答案
您可以使用 groupby
并汇总
You can use sort_values
with groupby
and aggregating diff
:
df['diff'] = df.sort_values(['id','time']).groupby('id')['time'].diff()
print (df)
id time diff
0 A 2016-11-25 16:32:17 NaT
1 A 2016-11-25 16:36:04 00:00:35
2 A 2016-11-25 16:35:29 00:03:12
3 B 2016-11-25 16:35:24 NaT
4 B 2016-11-25 16:35:46 00:00:22
如果需要删除列diff
中带有NaT
的行,请使用 dropna
:
If need remove rows with NaT
in column diff
use dropna
:
df = df.dropna(subset=['diff'])
print (df)
id time diff
2 A 2016-11-25 16:35:29 00:03:12
1 A 2016-11-25 16:36:04 00:00:35
4 B 2016-11-25 16:35:46 00:00:22
您还可以覆盖列:
df.time = df.sort_values(['id','time']).groupby('id')['time'].diff()
print (df)
id time
0 A NaT
1 A 00:00:35
2 A 00:03:12
3 B NaT
4 B 00:00:22
df.time = df.sort_values(['id','time']).groupby('id')['time'].diff()
df = df.dropna(subset=['time'])
print (df)
id time
1 A 00:00:35
2 A 00:03:12
4 B 00:00:22
这篇关于如何使用 pandas 按组计算时差?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文