pandas 群体百分比 [英] Pandas groupwise percentage

查看:152
本文介绍了 pandas 群体百分比的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我如何计算熊猫的分组比例?

类似于
熊猫:.groupby()。size()和百分比我想计算每个组的价值的百分比。



我该如何实现这一目标?

我的数据集结构像

 ClassLabel,Field 

最初,我将 ClassLbel Field like

  groupped = mydf.groupby(['Field','ClassLabel'])。size()。reset_index()
grouped = grouped.rename(columns = {0:'customersCountPerGroup'})

现在我想知道每个群组中每个群组的客户百分比基础。总共可以像 mydf.groupby(['Field'])。size()获得,但我既不能将它合并为一列,也不确定这是正确的做法 - 必须有一些简单的。



编辑



我想计算仅基于单组例如3 0 0.125 1 0.250 0 + 1 - > 0.125 + 0.250 = 0,375的总和,并使用此值来分组/归一化分组而不是分组。 stack.imgur.com/XpiVo.jpgrel =nofollow noreferrer>

解决方案

可以使用IIUC:

  mydf = pd.DataFrame({'Field':[1,1,3,3,3],
'ClassLabel':[4,4,4,4,4],
'A':[7,8,9,5,7]})

print(mydf)
ClassLabel字段
0 7 4 1
1 8 4 1
2 9 4 3
3 5 4 3
4 7 4 3

分组= = mydf.groupby(['Field','ClassLabel' ])。size()
print(分组)
Field ClassLabel
1 4 2
3 4 3
dtype:int64

print (100 * groupped / grouped.sum())
Field ClassLabel
1 4 40.0
3 4 60.0
dtype:float64






pre $ lt; code> grouped = mydf.groupby(['Field','ClassLabel'])。size()。reset_index()
grouped = groupped.rename(columns = {0:'customersCountPerGroup'})
打印(分组)
Field ClassLabel customersCountPerGroup
0 1 4 2
1 3 4 3

groupped ['per'] = 100 * grouped.customersCountPerGroup / grouped.customersCountPerGroup.sum()
print(grouped)
Field ClassLabel customersCountPerGroup per
0 1 4 2 40.0
1 3 4 3 60.0

编者按评论:

  mydf = pd.DataFrame({'Field':[1,1,3,3,3,4,5,6],
'ClassLabel' :[0,0,0,1,1,0,0,6],
'A':[7,8,9,5,7,5,6,4]} )

print(mydf)

groupped = mydf.groupby(['Field','ClassLabel'])。size()
df =分组/分组.sum()

df =(groupped / df.groupby(level = 0).transform('sum'))。reset_index(name ='new')
print(df)
场ClassLabel新
0 1 0 8.000000
1 3 0 2.666667
2 3 1 5.333333
3 4 0 8.000000
4 5 0 8.000000
5 6 6 8.000000


How can I calculate a group-wise percentage in pandas?

similar to Pandas: .groupby().size() and percentages or Pandas Very Simple Percent of total size from Group by I want to calculate the percentage of a value per group.

How can I achieve this?

My dataset is structured like

ClassLabel, Field

Initially, I aggregate on both ClassLbel and Field like

grouped = mydf.groupby(['Field', 'ClassLabel']).size().reset_index()
grouped = grouped.rename(columns={0: 'customersCountPerGroup'})

Now I would like to know the percentage of customers in each group on a per group basis. The groups total can be obtained like mydf.groupby(['Field']).size() but I neither can merge that as a column nor am I sure this is the right approach - there must be something simpler.

edit

I want to calculate the percentage only based on a single group e.g. 3 0 0.125 1 0.250 the sum of 0 + 1 --> 0.125 + 0.250 = 0,375 and use this value to devide / normalize grouped and not grouped.sum()

解决方案

IIUC you can use:

mydf = pd.DataFrame({'Field':[1,1,3,3,3],
                   'ClassLabel':[4,4,4,4,4],
                   'A':[7,8,9,5,7]})

print (mydf)
   A  ClassLabel  Field
0  7           4      1
1  8           4      1
2  9           4      3
3  5           4      3
4  7           4      3

grouped = mydf.groupby(['Field', 'ClassLabel']).size()
print (grouped)
Field  ClassLabel
1      4             2
3      4             3
dtype: int64

print (100 * grouped / grouped.sum())
Field  ClassLabel
1      4             40.0
3      4             60.0
dtype: float64


grouped = mydf.groupby(['Field', 'ClassLabel']).size().reset_index()
grouped = grouped.rename(columns={0: 'customersCountPerGroup'})
print (grouped)
   Field  ClassLabel  customersCountPerGroup
0      1           4                       2
1      3           4                       3

grouped['per'] = 100 * grouped.customersCountPerGroup / grouped.customersCountPerGroup.sum()
print (grouped)
   Field  ClassLabel  customersCountPerGroup   per
0      1           4                       2  40.0
1      3           4                       3  60.0

EDIT by comment:

mydf = pd.DataFrame({'Field':[1,1,3,3,3,4,5,6],
                   'ClassLabel':[0,0,0,1,1,0,0,6],
                   'A':[7,8,9,5,7,5,6,4]})

print (mydf)

grouped = mydf.groupby(['Field', 'ClassLabel']).size()
df =  grouped / grouped.sum()

df = (grouped / df.groupby(level=0).transform('sum')).reset_index(name='new')
print (df)
   Field  ClassLabel       new
0      1           0  8.000000
1      3           0  2.666667
2      3           1  5.333333
3      4           0  8.000000
4      5           0  8.000000
5      6           6  8.000000

这篇关于 pandas 群体百分比的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆