pandas 分组类别,等级,从每个类别中获得最高价值? [英] Pandas groupby category, rating, get top value from each category?

查看:105
本文介绍了 pandas 分组类别,等级,从每个类别中获得最高价值?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

关于SO的第一个问题,对熊猫来说还很陌生,但在术语上仍然有些动摇:我试图找出数据框上正确的语法/操作顺序,以便能够按B列分组,找到最大值(或最小值)C列中每个组的对应值,并检索A列中对应的值.

First question on SO, very new to pandas and still a little shaky on the terminology: I'm trying to figure out the proper syntax/sequence of operations on a dataframe to be able to group by column B, find the max (or min) corresponding value for each group in column C, and retrieve the corresponding value for that in column A.

假设这是我的数据框:

name     type      votes     
bob       dog        10
pete      cat         8
fluffy    dog         5
max       cat         9

使用df.groupby('type').votes.agg('max')返回:

dog     10
cat      9

到目前为止,太好了.但是,我想弄清楚如何返回此值:

So far, so good. However, I'd like to figure out how to return this:

dog    10    bob
cat     9    max 

我已经达到了df.groupby(['type', 'votes']).name.agg('max'),尽管返回了

I've gotten as far as df.groupby(['type', 'votes']).name.agg('max'), though that returns

dog   5    fluffy
      10   bob
cat   8    pete
      9    max

...对于这个假装的数据帧来说很好,但是在处理更大的数据帧时并没有太大帮助.

... which is fine for this pretend dataframe, but doesn't quite help when working with a much larger one.

非常感谢!

推荐答案

如果df的索引没有重复值,则可以使用

If df has an index with no duplicate values, then you can use idxmax to return the index of the maximum row for each group. Then use df.loc to select the entire row:

In [322]: df.loc[df.groupby('type').votes.agg('idxmax')]
Out[322]: 
  name type  votes
3  max  cat      9
0  bob  dog     10

如果df.index具有重复值,即不是唯一索引,请首先使索引唯一:

If df.index has duplicate values, i.e. is not a unique index, then make the index unique first:

df = df.reset_index()

然后使用idxmax:

result = df.loc[df.groupby('type').votes.agg('idxmax')]

如果确实需要,可以将df返回其原始状态:

If you really need to, you can return df to its original state:

df = df.set_index(['index'], drop=True)

但是在一般情况下,使用唯一索引会更好.

but in general life is much better with a unique index.

以下是显示df没有唯一标识时出了什么问题的示例 指数.假设indexAABB:

Here is an example showing what goes wrong when df does not have a unique index. Suppose the index is AABB:

import pandas as pd
df = pd.DataFrame({'name': ['bob', 'pete', 'fluffy', 'max'],
                   'type': ['dog', 'cat', 'dog', 'cat'],
                   'votes': [10, 8, 5, 9]}, 
                  index=list('AABB'))
print(df)
#      name type  votes
# A     bob  dog     10
# A    pete  cat      8
# B  fluffy  dog      5
# B     max  cat      9

idxmax返回索引值AB:

print(df.groupby('type').votes.agg('idxmax'))
type
cat    B
dog    A
Name: votes, dtype: object

但是AB不会唯一地指定所需的行. df.loc[...] 返回其索引值为AB的所有行:

But A and B do not uniquely specify the desired rows. df.loc[...] returns all rows whose index value is A or B:

print(df.loc[df.groupby('type').votes.agg('idxmax')])
#      name type  votes
# B  fluffy  dog      5
# B     max  cat      9
# A     bob  dog     10
# A    pete  cat      8

相反,如果我们重置索引:

In contrast, if we reset the index:

df = df.reset_index()
#   index    name type  votes
# 0     A     bob  dog     10
# 1     A    pete  cat      8
# 2     B  fluffy  dog      5
# 3     B     max  cat      9

然后df.loc可用于选择所需的行:

then df.loc can be used to select the desired rows:

print(df.groupby('type').votes.agg('idxmax'))
# type
# cat    3
# dog    0
# Name: votes, dtype: int64

print(df.loc[df.groupby('type').votes.agg('idxmax')])
#   index name type  votes
# 3     B  max  cat      9
# 0     A  bob  dog     10

这篇关于 pandas 分组类别,等级,从每个类别中获得最高价值?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆