在Pandas groupby对象中获取比率 [英] Getting a ratio in Pandas groupby object

查看:90
本文介绍了在Pandas groupby对象中获取比率的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个看起来像这样的数据框:

I have a dataframe that looks like this:

我想为每个状态创建另一列"engaged_percent",这基本上是唯一的engaged_count数除以每个特定状态的user_count数.

I want to create another column called "engaged_percent" for each state which is basically the number of unique engaged_count divided by the user_count of each particular state.

我尝试执行以下操作:

def f(x):
    engaged_percent = x['engaged_count'].nunique()/x['user_count']
    return pd.Series({'engaged_percent': engaged_percent})

by = df3.groupby(['user_state']).apply(f)
by

但是它给了我以下结果:

But it gave me the following result:

我想要的是这样的:

user_state        engaged_percent
---------------------------------
California           2/21 = 0.09
Florida              2/7 =  0.28

我认为我的方法是正确的,但是我不确定为什么我的结果会像第二张图所示那样出现.

I think my approach is correct , however I am not sure why my result shows up like the one seen in the second picture.

任何帮助将不胜感激!预先感谢!

Any help would be much appreciated! Thanks in advance!

推荐答案

怎么样:

user_count=df3.groupby('user_state')['user_count'].mean()
#(or however you think a value for each state should be calculated)

engaged_unique=df3.groupby('user_state')['engaged_count'].nunique()

engaged_pct=engaged_unique/user_count

(您也可以通过多种方式在一行中完成此操作)

(you could also do this in one line in a bunch of different ways)

您最初的解决方案几乎没问题,只不过您是将值除以整个user count系列.因此,您获得的是系列而不是值.您可以尝试这种细微的变化:

Your original solution was almost fine except that you were dividing a value by the entire user count series. So you were getting a Series instead of a value. You could try this slight variation:

def f(x):
    engaged_percent = x['engaged_count'].nunique()/x['user_count'].mean()
    return engaged_percent

by = df3.groupby(['user_state']).apply(f)
by

这篇关于在Pandas groupby对象中获取比率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆