考虑每个计数重新计算平均值 [英] recalculate mean considering each count

查看：48 发布时间：2021/5/13 19:47:27 python pandas dataframe group-by

本文介绍了考虑每个计数重新计算平均值的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

如果数据框如下所示

index yearmon   college major   gpa     num
0     20140401  1       a       3.36    29
1     20180401  2       b       2.63    48
2     20160401  3       c       3.23    55
3     20170401  4       d       4.22    1
4     20140401  3       b       3.72    72

给出每年，每所大学，每个专业的平均gpa.

which gives the average gpa of each year, each college, each major.

我只想考虑主要因素，就想建立一个新的数据集作为平均gpa.

i want to make a new data set as average gpa only considering the major.

例如，对于专业b，在不同的时间有2个数据，

for example for major b, there is 2 data in different time,

所以我必须做一个新的平均每学期平均绩点(strong)考虑人数(学生人数)

so i have to make a new gpa average considering the num (number of students)

我尝试了groupby函数，但由于它们具有相同的num计数(不考虑num变量)，因此只能求平均值

i have tried groupby function, but it only makes average as they have same num count (not considering num variable)

有没有办法解决这个问题?

is there way to solve this problem?

推荐答案

鉴于学生人数是整数，这是一种懒惰的方式，

A lazy way, given that the number of students are integers,

(df.loc[df.index.repeat(df['num']), ['major', 'gpa']]
   .groupby('major').mean()
)

选项2 groupby().apply()和 np.average :

(df.groupby('major')
   .apply(lambda x: np.average(x['gpa'], weights=x['num']))
)

选项3 :最复杂但性能最好的是分配总分，并手动计算平均值:

Option 3 Most complicated but best performant is to assign the total score, and calculate the average manually:

df['total'] = df['gpa'] * df['num']
groups = df.groupby('major')
out = groups['total'].sum()/groups['num'].sum()

输出:

         gpa
major       
a      3.360
b      3.284
c      3.230
d      4.220

这篇关于考虑每个计数重新计算平均值的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

考虑每个计数重新计算平均值 [英] recalculate mean considering each count

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

考虑每个计数重新计算平均值 [英] recalculate mean considering each count

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭