命名Pandas集合函数中的返回列？ [英] Naming returned columns in Pandas aggregate function?

查看：249 发布时间：2018/5/30 13:33:17 python group-by pandas aggregate-functions

本文介绍了命名Pandas集合函数中的返回列？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在Pandas的groupby功能方面遇到麻烦。我已阅读文档，但我看不到图如何将聚合函数应用于多列和具有这些列的自定义名称。

这非常接近，但数据结构返回嵌套列标题：

  data.groupby（Country）。agg（
 {column1：{） foo：sum（）}，column2：{mean：np.mean，std：np.std}}）

（也就是说，我想采用column2的均值和标准，但将这些列返回为mean和std）

我缺少什么？

解决方案

这将从层级列索引中删除最外层：

  df = data.groupby（...）。agg（...）
 df.columns = df.columns.droplevel（0 ）

如果您想保持最外层，您可以使用ravel（）函数多层列以形成新标签：

  df.columns = [_。join（x）for x in df.columns.ravel（）]

例如：
将pandas导入为pd 将pandas.rpy.common导入为com 将numpy导入为np data = com.load_data（'Loblolly'） print（data.head（））＃高度年龄种子＃1 4.51 3 301 ＃15 10.89 5 301 ＃29 28.72 10 301 ＃43 41.74 15 301 ＃57 52.70 20 301 df = data.groupby（'Seed'） .agg（ {'age'：['sum']， 'height'：['mean'，'std']}） print（df.head（））＃年龄身高＃总和标准平均值＃种子＃301 78 22.638417 33.246667 ＃303 78 23.499706 34.106667 ＃305 78 23.927090 35.115000 ＃307 78 22.222266 31.328333 ＃309 78 23.132574 33.781667 df.columns = df.columns.droplevel（0） print（df.head（））
产量

总和std平均值种子 301 78 22.638417 33.246667 303 78 23.499706 34.106667 305 78 23.927090 35.115000 307 78 22.222266 31.328333 309 78 23.132574 33.781667
或者，保持指数的第一级：

df = data。 groupby（'Seed'）。agg（ ''age'：['sum']， 'height'：['mean'，'std']}） df.columns = [_。join（x）for df.columns.ravel（）]
产量

age_sum height_std height_mean 种子 301 78 22.638417 33.246667 303 78 23.499706 34.106667 305 78 23.927090 35.115000 307 78 22.222266 31.3 28333 309 78 23.132574 33.781667

I'm having trouble with Pandas' groupby functionality. I've read the documentation, but I can't see to figure out how to apply aggregate functions to multiple columns and have custom names for those columns.

This comes very close, but the data structure returned has nested column headings:
data.groupby("Country").agg( {"column1": {"foo": sum()}, "column2": {"mean": np.mean, "std": np.std}})
(ie. I want to take the mean and std of column2, but return those columns as "mean" and "std")

What am I missing?
解决方案
This will drop the outermost level from the hierarchical column index:
df = data.groupby(...).agg(...) df.columns = df.columns.droplevel(0)
If you'd like to keep the outermost level, you can use the ravel() function on the multi-level column to form new labels:
df.columns = ["_".join(x) for x in df.columns.ravel()]

For example:
import pandas as pd import pandas.rpy.common as com import numpy as np data = com.load_data('Loblolly') print(data.head()) # height age Seed # 1 4.51 3 301 # 15 10.89 5 301 # 29 28.72 10 301 # 43 41.74 15 301 # 57 52.70 20 301 df = data.groupby('Seed').agg( {'age':['sum'], 'height':['mean', 'std']}) print(df.head()) # age height # sum std mean # Seed # 301 78 22.638417 33.246667 # 303 78 23.499706 34.106667 # 305 78 23.927090 35.115000 # 307 78 22.222266 31.328333 # 309 78 23.132574 33.781667 df.columns = df.columns.droplevel(0) print(df.head())
yields
sum std mean Seed 301 78 22.638417 33.246667 303 78 23.499706 34.106667 305 78 23.927090 35.115000 307 78 22.222266 31.328333 309 78 23.132574 33.781667
Alternatively, to keep the first level of the index:
df = data.groupby('Seed').agg( {'age':['sum'], 'height':['mean', 'std']}) df.columns = ["_".join(x) for x in df.columns.ravel()]
yields
age_sum height_std height_mean Seed 301 78 22.638417 33.246667 303 78 23.499706 34.106667 305 78 23.927090 35.115000 307 78 22.222266 31.328333 309 78 23.132574 33.781667

这篇关于命名Pandas集合函数中的返回列？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

命名Pandas集合函数中的返回列？ [英] Naming returned columns in Pandas aggregate function?

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

命名Pandas集合函数中的返回列？ [英] Naming returned columns in Pandas aggregate function?

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭