考虑每个计数重新计算平均值 [英] recalculate mean considering each count

查看:48
本文介绍了考虑每个计数重新计算平均值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果数据框如下所示

index yearmon   college major   gpa     num
0     20140401  1       a       3.36    29
1     20180401  2       b       2.63    48
2     20160401  3       c       3.23    55
3     20170401  4       d       4.22    1
4     20140401  3       b       3.72    72

给出每年,每所大学,每个专业的平均gpa.

which gives the average gpa of each year, each college, each major.

我只想考虑主要因素,就想建立一个新的数据集作为平均gpa.

i want to make a new data set as average gpa only considering the major.

例如,对于专业b,在不同的时间有2个数据,

for example for major b, there is 2 data in different time,

所以我必须做一个新的平均每学期平均绩点(strong)考虑人数(学生人数)

so i have to make a new gpa average considering the num (number of students)

我尝试了groupby函数,但由于它们具有相同的num计数(不考虑num变量),因此只能求平均值

i have tried groupby function, but it only makes average as they have same num count (not considering num variable)

有没有办法解决这个问题?

is there way to solve this problem?

推荐答案

鉴于学生人数是整数,这是一种懒惰的方式,

A lazy way, given that the number of students are integers,

(df.loc[df.index.repeat(df['num']), ['major', 'gpa']]
   .groupby('major').mean()
)


选项2 groupby().apply() np.average :

(df.groupby('major')
   .apply(lambda x: np.average(x['gpa'], weights=x['num']))
)


选项3 :最复杂但性能最好的是分配总分,并手动计算平均值:


Option 3 Most complicated but best performant is to assign the total score, and calculate the average manually:

df['total'] = df['gpa'] * df['num']
groups = df.groupby('major')
out = groups['total'].sum()/groups['num'].sum()


输出:

         gpa
major       
a      3.360
b      3.284
c      3.230
d      4.220

这篇关于考虑每个计数重新计算平均值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆