最佳方式(运行时)基于分组依据汇总(计算)总和与总计数的比率 [英] Best way(run-time) to aggregate (calculate ratio of) sum to total count based on group by

查看:63
本文介绍了最佳方式(运行时)基于分组依据汇总(计算)总和与总计数的比率的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试确定每个人的批准申请(由标记"1"标识,如果不是,则为"0")与每个人的总申请数(Cust_ID)的比率.我已经通过以下代码实现了此逻辑,但是对于160万条记录,要花大约10分钟才能计算出该逻辑.有更快的速度执行相同的操作吗?

I'm trying to identify ratio of approved applications(identified by flag '1' and if not then '0') to total applications for each person(Cust_ID). I have achieved this logic by the following code but it takes about 10 mins to compute this for 1.6 M records. Is there a faster to perform the same operation?

# Finding ratio of approved out of total applications
df_approved_ratio = df.groupby('Cust_ID').apply(lambda x:x['STATUS_Approved'].sum()/len(x))

推荐答案

我认为需要按 mean :

df = pd.DataFrame({'STATUS_Approved':[0,1,0,0,1,1],
                   'Cust_ID':list('aaabbb')})

print (df)
   STATUS_Approved Cust_ID
0                0       a
1                1       a
2                0       a
3                0       b
4                1       b
5                1       b

df_approved_ratio = df.groupby('Cust_ID')['STATUS_Approved'].mean()
print (df_approved_ratio)
Cust_ID
a    0.333333
b    0.666667
Name: STATUS_Approved, dtype: float64

print (df.groupby('Cust_ID').apply(lambda x:x['STATUS_Approved'].sum()/len(x)))
Cust_ID
a    0.333333
b    0.666667
Name: STATUS_Approved, dtype: float64

这篇关于最佳方式(运行时)基于分组依据汇总(计算)总和与总计数的比率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆