最佳方式(运行时)基于分组依据汇总(计算)总和与总计数的比率 [英] Best way(run-time) to aggregate (calculate ratio of) sum to total count based on group by
本文介绍了最佳方式(运行时)基于分组依据汇总(计算)总和与总计数的比率的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在尝试确定每个人的批准申请(由标记"1"标识,如果不是,则为"0")与每个人的总申请数(Cust_ID)的比率.我已经通过以下代码实现了此逻辑,但是对于160万条记录,要花大约10分钟才能计算出该逻辑.有更快的速度执行相同的操作吗?
I'm trying to identify ratio of approved applications(identified by flag '1' and if not then '0') to total applications for each person(Cust_ID). I have achieved this logic by the following code but it takes about 10 mins to compute this for 1.6 M records. Is there a faster to perform the same operation?
# Finding ratio of approved out of total applications
df_approved_ratio = df.groupby('Cust_ID').apply(lambda x:x['STATUS_Approved'].sum()/len(x))
推荐答案
我认为需要按 mean
:
df = pd.DataFrame({'STATUS_Approved':[0,1,0,0,1,1],
'Cust_ID':list('aaabbb')})
print (df)
STATUS_Approved Cust_ID
0 0 a
1 1 a
2 0 a
3 0 b
4 1 b
5 1 b
df_approved_ratio = df.groupby('Cust_ID')['STATUS_Approved'].mean()
print (df_approved_ratio)
Cust_ID
a 0.333333
b 0.666667
Name: STATUS_Approved, dtype: float64
print (df.groupby('Cust_ID').apply(lambda x:x['STATUS_Approved'].sum()/len(x)))
Cust_ID
a 0.333333
b 0.666667
Name: STATUS_Approved, dtype: float64
这篇关于最佳方式(运行时)基于分组依据汇总(计算)总和与总计数的比率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文