快速高效的 pandas Groupby sum / mean,无聚合 [英] Fast, efficient pandas Groupby sum / mean without aggregation
本文介绍了快速高效的 pandas Groupby sum / mean,无聚合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
在 pandas
中执行分组和聚合非常简单快捷。但是,执行 pandas
已经在C 无聚合中内置的简单groupby-apply函数,至少以我的方式,要慢得多
It is easy and fast to perform grouping and aggregation in pandas
. However, performing simple groupby-apply functions that pandas
already has built in C without aggregation, at least in the way I do it, is far slower because of a lambda function.
# Form data
>>> import numpy as np
>>> import pandas as pd
>>> df = pd.DataFrame(np.random.random((100,3)),columns=['a','b','c'])
>>> df['g'] = np.random.randint(0,3,100)
>>> df.head()
a b c g
0 0.901610 0.643869 0.094082 1
1 0.536437 0.836622 0.763244 1
2 0.647989 0.150460 0.476552 0
3 0.206455 0.319881 0.690032 2
4 0.153557 0.765174 0.377879 1
# groupby and apply and aggregate
>>> df.groupby('g')['a'].sum()
g
0 17.177280
1 15.395264
2 17.668056
Name: a, dtype: float64
# groupby and apply without aggregation
>>> df.groupby('g')['a'].transform(lambda x: x.sum())
0 15.395264
1 15.395264
2 17.177280
3 17.668056
4 15.395264
95 15.395264
96 17.668056
97 15.395264
98 17.668056
99 17.177280
Name: a, Length: 100, dtype: float64
因此,我具有lambda函数所需的功能,但是
Thus, I have the functionality desired with the lambda function, but the speed is bad.
>>> %timeit df.groupby('g')['a'].sum()
1.11 ms ± 143 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
>>> %timeit df.groupby('g')['a'].transform(lambda x:x.sum())
4.01 ms ± 699 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
这在较大的数据集中会成为问题。我认为可以更快更有效地获得此功能。
This becomes a problem in larger datasets. I assume there is a faster and more efficient to get this functionality.
推荐答案
可能您正在寻找
df.groupby('g')['a'].transform('sum')
确实比应用的版本快:
import numpy as np
import pandas as pd
import timeit
df = pd.DataFrame(np.random.random((100,3)),columns=['a','b','c'])
df['g'] = np.random.randint(0,3,100)
def groupby():
df.groupby('g')['a'].sum()
def transform_apply():
df.groupby('g')['a'].transform(lambda x: x.sum())
def transform():
df.groupby('g')['a'].transform('sum')
print('groupby',timeit.timeit(groupby,number=10))
print('lambda transform',timeit.timeit(transform_apply,number=10))
print('transform',timeit.timeit(transform,number=10))
输出:
groupby 0.010655807999999989
lambda transform 0.029328375000000073
transform 0.01493376600000007
这篇关于快速高效的 pandas Groupby sum / mean,无聚合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文