考虑每个计数重新计算平均值 [英] recalculate mean considering each count
问题描述
如果数据框如下所示
index yearmon college major gpa num
0 20140401 1 a 3.36 29
1 20180401 2 b 2.63 48
2 20160401 3 c 3.23 55
3 20170401 4 d 4.22 1
4 20140401 3 b 3.72 72
给出每年,每所大学,每个专业的平均gpa.
which gives the average gpa of each year, each college, each major.
我只想考虑主要因素,就想建立一个新的数据集作为平均gpa.
i want to make a new data set as average gpa only considering the major.
例如,对于专业b,在不同的时间有2个数据,
for example for major b, there is 2 data in different time,
所以我必须做一个新的平均每学期平均绩点(strong)考虑人数(学生人数)
so i have to make a new gpa average considering the num (number of students)
我尝试了groupby函数,但由于它们具有相同的num计数(不考虑num变量),因此只能求平均值
i have tried groupby function, but it only makes average as they have same num count (not considering num variable)
有没有办法解决这个问题?
is there way to solve this problem?
推荐答案
鉴于学生人数是整数,这是一种懒惰的方式,
A lazy way, given that the number of students are integers,
(df.loc[df.index.repeat(df['num']), ['major', 'gpa']]
.groupby('major').mean()
)
选项2 groupby().apply()
和 np.average
:
(df.groupby('major')
.apply(lambda x: np.average(x['gpa'], weights=x['num']))
)
选项3 :最复杂但性能最好的是分配总分,并手动计算平均值:
Option 3 Most complicated but best performant is to assign the total score, and calculate the average manually:
df['total'] = df['gpa'] * df['num']
groups = df.groupby('major')
out = groups['total'].sum()/groups['num'].sum()
输出:
gpa
major
a 3.360
b 3.284
c 3.230
d 4.220
这篇关于考虑每个计数重新计算平均值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!