如何使用groupby对象来获取其他列的总和? [英] How to use groupby objects to get sums of other columns?
本文介绍了如何使用groupby对象来获取其他列的总和?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
数据框看起来像
玩家职位薪水年
0 Mike Witt Pitcher 1400000 1988
1 George Hendrick Outfielder 989333 1988
2 Chili Davis Outfielder 950000 1988
3 Brian Downing指定的Hitter 900000 1988
4 Bob Boone Catcher 883000 1988
。
。
。
作为一个实验,我试图找到通过他们的积累累积最高总薪水的投手事业。
到目前为止,我已经尝试过:
mask = mlb.Position ==Pitcher
pitchers = mlb [mask]
pitcher_groups = pitchers.groupby(Player)
我不知道如何继续使用groupby对象。我知道我需要在每个组中找到工资,并做一些比较...我如何做这个没有for循环?
解决方案
只要做:
pitcher_groups ['薪资'] sum()
这将对groupby对象的薪水列进行总和。
在[57]中:
df [df ['Position'] =='Pitcher']。groupby('Player')['Salary']。sum()
出[57]:
玩家
Mike Witt 1400000
名称:薪水,dtype:int64
I'm playing with an mlb data set from the web to help learn. The dataframe looks like
Player Position Salary Year
0 Mike Witt Pitcher 1400000 1988
1 George Hendrick Outfielder 989333 1988
2 Chili Davis Outfielder 950000 1988
3 Brian Downing Designated Hitter 900000 1988
4 Bob Boone Catcher 883000 1988
.
.
.
As an experiment, I'm trying to find the pitcher that has accumulated the highest total salary through their career. mlb
is the dataframe.
So far I have tried:
mask = mlb.Position == "Pitcher"
pitchers = mlb[mask]
pitcher_groups = pitchers.groupby("Player")
I'm not sure how to proceed with the groupby object. I know I need to find the salary sum in each group, and do some sort of comparison... How do I do this without for loops?
解决方案
Just do:
pitcher_groups['Salary'].sum()
This sums the salary column on the groupby object.
In [57]:
df[df['Position']=='Pitcher'].groupby('Player')['Salary'].sum()
Out[57]:
Player
Mike Witt 1400000
Name: Salary, dtype: int64
这篇关于如何使用groupby对象来获取其他列的总和?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文