如何使用groupby对象来获取其他列的总和? [英] How to use groupby objects to get sums of other columns?

查看:127
本文介绍了如何使用groupby对象来获取其他列的总和?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在玩一个从网络上的一个mlb数据集,以帮助学习。
数据框看起来像

 玩家职位薪水年
0 Mike Witt Pitcher 1400000 1988
1 George Hendrick Outfielder 989333 1988
2 Chili Davis Outfielder 950000 1988
3 Brian Downing指定的Hitter 900000 1988
4 Bob Boone Catcher 883000 1988



作为一个实验,我试图找到通过他们的积累累积最高总薪水的投手事业。



到目前为止,我已经尝试过:

  mask = mlb.Position ==Pitcher
pitchers = mlb [mask]
pitcher_groups = pitchers.groupby(Player)

我不知道如何继续使用groupby对象。我知道我需要在每个组中找到工资,并做一些比较...我如何做这个没有for循环?

解决方案

只要做:

  pitcher_groups ['薪资'] sum()

这将对groupby对象的薪水列进行总和。

 在[57]中:

df [df ['Position'] =='Pitcher']。groupby('Player')['Salary']。sum()
出[57]:
玩家
Mike Witt 1400000
名称:薪水,dtype:int64


I'm playing with an mlb data set from the web to help learn. The dataframe looks like

    Player             Position          Salary     Year
0   Mike Witt          Pitcher           1400000    1988
1   George Hendrick    Outfielder        989333     1988
2   Chili Davis        Outfielder        950000     1988
3   Brian Downing      Designated Hitter 900000     1988
4   Bob Boone          Catcher           883000     1988
.
. 
.

As an experiment, I'm trying to find the pitcher that has accumulated the highest total salary through their career. mlb is the dataframe.

So far I have tried:

mask = mlb.Position == "Pitcher"
pitchers = mlb[mask]
pitcher_groups = pitchers.groupby("Player")

I'm not sure how to proceed with the groupby object. I know I need to find the salary sum in each group, and do some sort of comparison... How do I do this without for loops?

解决方案

Just do:

pitcher_groups['Salary'].sum()

This sums the salary column on the groupby object.

In [57]:

df[df['Position']=='Pitcher'].groupby('Player')['Salary'].sum()
Out[57]:
Player
Mike Witt    1400000
Name: Salary, dtype: int64

这篇关于如何使用groupby对象来获取其他列的总和?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆