Python DataFrame中Timedelta值的汇总 [英] Aggregations for Timedelta values in the Python DataFrame
问题描述
我有一个大的DataFrame(df),看起来像:
I have big DataFrame (df) which looks like:
Acc_num date_diff
0 29 0:04:43
1 29 0:01:43
2 29 2:22:45
3 29 0:16:21
4 29 0:58:20
5 30 0:00:35
6 34 7:15:26
7 34 4:40:01
8 34 0:56:02
9 34 6:53:44
10 34 1:36:58
.....
Acc_num int64
date_diff timedelta64[ns]
dtype: object
我需要为每个帐号计算 date_diff平均值(以timedelta格式)。
df.date_diff.mean()
正常工作。但是当我尝试下一个:
df.groupby('Acc_num')。date_diff.mean()
时,它会引发异常:
I need to calculate 'date_diff' mean (in timedelta format) for each account number.
df.date_diff.mean()
works correctly. But when I try next:
df.groupby('Acc_num').date_diff.mean()
it raises an exception:
"DataError: No numeric types to aggregate"
我也尝试了 df .pivot_table()
方法,但是什么也没实现。
I also tried df.pivot_table()
method, but didn't acheive anything.
有人可以帮我这个忙吗?预先谢谢您!
Could someone help me with this stuff. Thank you in advance!
推荐答案
确实有怪异的限制。但是一个简单的解决方案是:
Weird limitation indeed. But a simple solution would be:
df.groupby('Acc_num').date_diff.agg(lambda g:g.sum()/g.count())
编辑:
如果您传递 numeric_only = False
df.groupby('Acc_num').date_diff.mean(numeric_only=False)
这篇关于Python DataFrame中Timedelta值的汇总的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!