pandas ：具有多种功能的分组和聚合 [英] Pandas: grouping and aggregation with multiple functions

查看：100 发布时间：2020/6/2 20:36:15 python pandas dataframe aggregate

本文介绍了 pandas ：具有多种功能的分组和聚合的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我的熊猫数据帧定义如下：

I have a pandas dataframe defined as follows:

import pandas as pd

headers = ['Group', 'Element', 'Case', 'Score', 'Evaluation']
data = [
    ['A', 1, 'x', 1.40, 0.59],
    ['A', 1, 'y', 9.19, 0.52],
    ['A', 2, 'x', 8.82, 0.80],
    ['A', 2, 'y', 7.18, 0.41],
    ['B', 1, 'x', 1.38, 0.22],
    ['B', 1, 'y', 7.14, 0.10],
    ['B', 2, 'x', 9.12, 0.28],
    ['B', 2, 'y', 4.11, 0.97],
]
df = pd.DataFrame(data, columns=headers)

在控制台输出中看起来像这样：

which looks like this in console output:

  Group  Element Case  Score  Evaluation
0     A        1    x   1.40        0.59
1     A        1    y   9.19        0.52
2     A        2    x   8.82        0.80
3     A        2    y   7.18        0.41
4     B        1    x   1.38        0.22
5     B        1    y   7.14        0.10
6     B        2    x   9.12        0.28
7     B        2    y   4.11        0.97

问题

我想对 df 执行分组和聚合操作，这将为我提供以下结果数据框：

Problem

I'd like to perform a grouping-and-aggregation operation on df that will give me the following result dataframe:

  Group  Max_score_value  Max_score_element  Max_score_case  Min_evaluation
0     A             9.19                  1               y            0.41 
1     B             9.12                  2               x            0.10

要详细说明：我想按 Group 分组列，然后应用聚合以获取以下结果列：

To clarify in more detail: I'd like to group by the Group column, and then apply aggregation to get the following result columns:

Max_score_value ：
Max_score_element ：分数列中的组最大值。 Element 列中的值，它对应于组最大值 Score 的值。

Max_scor e_case ： Case 列中的值对应于组最大 Score 值

Min_evaluation ：评估列中的组最小值。

Max_score_value: the group-maximum value from the Score column.
Max_score_element: the value from the Element column that corresponds to the group-maximum Score value.
Max_score_case: the value from the Case column that corresponds to the group-maximum Score value.
Min_evaluation: the group-minimum value from the Evaluation column.

我想出了以下代码对于分组和聚集：

I've come up with the following code for the grouping-and-aggregation:

result = (
    df.set_index(['Element', 'Case'])
    .groupby('Group')
    .agg({'Score': ['max', 'idxmax'], 'Evaluation': 'min'})
    .reset_index()
)
print(result)

其输出为：

  Group Score         Evaluation
          max  idxmax        min
0     A  9.19  (1, y)       0.41
1     B  9.12  (2, x)       0.10

可以看到基本数据，但是它不是我需要的格式。这是我努力的最后一步。

As you can see the basic data is there, but it's not quite in the format yet that I need. It's this last step that I'm struggling with. Does anyone here have some good ideas for generating a result dataframe in the format that I'm looking for?

推荐答案

从这里开始，在这里，有人在生成所需格式的结果数据框方面有一些好主意吗？ 结果数据框，可以按照以下两个步骤转换为所需的格式：

Starting from the result data frame, you can transform in two steps as follows to the format you need:

# collapse multi index column to single level column
result.columns = [y + '_' + x if y != '' else x for x, y in result.columns]

# split the idxmax column into two columns
result = result.assign(
    max_score_element = result.idxmax_Score.str[0],
    max_score_case = result.idxmax_Score.str[1]
).drop('idxmax_Score', 1)

result

#Group  max_Score   min_Evaluation  max_score_case  max_score_element
#0   A       9.19             0.41               y                  1
#1   B       9.12             0.10               x                  2

从原始 df 使用 join ，它可能不如@tarashypka的想法那样有效，但不太冗长：

An alternative starting from original df using join, which may not be as efficient but less verbose similar to @tarashypka's idea:

(df.groupby('Group')
   .agg({'Score': 'idxmax', 'Evaluation': 'min'})
   .set_index('Score')
   .join(df.drop('Evaluation',1))
   .reset_index(drop=True))

#Evaluation  Group  Element   Case  Score
#0     0.41      A        1      y   9.19
#1     0.10      B        2      x   9.12

使用示例数据集的原始计时：

Naive timing with the example data set:

%%timeit 
(df.groupby('Group')
 .agg({'Score': 'idxmax', 'Evaluation': 'min'})
 .set_index('Score')
 .join(df.drop('Evaluation',1))
 .reset_index(drop=True))
# 100 loops, best of 3: 3.47 ms per loop

%%timeit
result = (
    df.set_index(['Element', 'Case'])
    .groupby('Group')
    .agg({'Score': ['max', 'idxmax'], 'Evaluation': 'min'})
    .reset_index()
)

result.columns = [y + '_' + x if y != '' else x for x, y in result.columns]

result = result.assign(
    max_score_element = result.idxmax_Score.str[0],
    max_score_case = result.idxmax_Score.str[1]
).drop('idxmax_Score', 1)
# 100 loops, best of 3: 7.61 ms per loop

这篇关于 pandas ：具有多种功能的分组和聚合的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

pandas ：具有多种功能的分组和聚合 [英] Pandas: grouping and aggregation with multiple functions

问题描述

问题

Problem

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

pandas ：具有多种功能的分组和聚合 [英] Pandas: grouping and aggregation with multiple functions

问题描述

问题

Problem

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭