最佳方式(运行时)基于分组依据汇总(计算)总和与总计数的比率 [英] Best way(run-time) to aggregate (calculate ratio of) sum to total count based on group by

查看：63 发布时间：2021/4/21 18:55:36 pandas python-3.6 calculation

本文介绍了最佳方式(运行时)基于分组依据汇总(计算)总和与总计数的比率的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试确定每个人的批准申请(由标记"1"标识，如果不是，则为"0")与每个人的总申请数(Cust_ID)的比率.我已经通过以下代码实现了此逻辑，但是对于160万条记录，要花大约10分钟才能计算出该逻辑.有更快的速度执行相同的操作吗?

I'm trying to identify ratio of approved applications(identified by flag '1' and if not then '0') to total applications for each person(Cust_ID). I have achieved this logic by the following code but it takes about 10 mins to compute this for 1.6 M records. Is there a faster to perform the same operation?

# Finding ratio of approved out of total applications
df_approved_ratio = df.groupby('Cust_ID').apply(lambda x:x['STATUS_Approved'].sum()/len(x))

推荐答案

我认为需要按 mean :

df = pd.DataFrame({'STATUS_Approved':[0,1,0,0,1,1],
                   'Cust_ID':list('aaabbb')})

print (df)
   STATUS_Approved Cust_ID
0                0       a
1                1       a
2                0       a
3                0       b
4                1       b
5                1       b

df_approved_ratio = df.groupby('Cust_ID')['STATUS_Approved'].mean()
print (df_approved_ratio)
Cust_ID
a    0.333333
b    0.666667
Name: STATUS_Approved, dtype: float64

print (df.groupby('Cust_ID').apply(lambda x:x['STATUS_Approved'].sum()/len(x)))
Cust_ID
a    0.333333
b    0.666667
Name: STATUS_Approved, dtype: float64

这篇关于最佳方式(运行时)基于分组依据汇总(计算)总和与总计数的比率的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

最佳方式(运行时)基于分组依据汇总(计算)总和与总计数的比率 [英] Best way(run-time) to aggregate (calculate ratio of) sum to total count based on group by

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

最佳方式(运行时)基于分组依据汇总(计算)总和与总计数的比率 [英] Best way(run-time) to aggregate (calculate ratio of) sum to total count based on group by

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭