命名Pandas集合函数中的返回列? [英] Naming returned columns in Pandas aggregate function?

查看:249
本文介绍了命名Pandas集合函数中的返回列?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在Pandas的groupby功能方面遇到麻烦。我已阅读文档,但我看不到图如何将聚合函数应用于多列具有这些列的自定义名称。



这非常接近,但数据结构返回嵌套列标题:

  data.groupby(Country)。agg(
{column1:{) foo:sum()},column2:{mean:np.mean,std:np.std}})

(也就是说,我想采用column2的均值和标准,但将这些列返回为mean和std)

我缺少什么?

解决方案

这将从层级列索引中删除最外层:

  df = data.groupby(...)。agg(...)
df.columns = df.columns.droplevel(0 )

如果您想保持最外层,您可以使用ravel()函数多层列以形成新标签:

  df.columns = [_。join(x)for x in df.columns.ravel()] 






例如:

 将pandas导入为pd 
将pandas.rpy.common导入为com
将numpy导入为np

data = com.load_data('Loblolly')
print(data.head())
#高度年龄种子
#1 4.51 3 301
#15 10.89 5 301
#29 28.72 10 301
#43 41.74 15 301
#57 52.70 20 301

df = data.groupby('Seed') .agg(
{'age':['sum'],
'height':['mean','std']})
print(df.head())
#年龄身高
#总和标准平均值
#种子
#301 78 22.638417 33.246667
#303 78 23.499706 34.106667
#305 78 23.927090 35.115000
#307 78 22.222266 31.328333
#309 78 23.132574 33.781667

df.columns = df.columns.droplevel(0)
print(df.head())

产量

 总和std平均值
种子
301 78 22.638417 33.246667
303 78 23.499706 34.106667
305 78 23.927090 35.115000
307 78 22.222266 31.328333
309 78 23.132574 33.781667

或者,保持指数的第一级:

  df = data。 groupby('Seed')。agg(
''age':['sum'],
'height':['mean','std']})
df.columns = [_。join(x)for df.columns.ravel()]

产量

  age_sum height_std height_mean 
种子
301 78 22.638417 33.246667
303 78 23.499706 34.106667
305 78 23.927090 35.115000
307 78 22.222266 31.3 28333
309 78 23.132574 33.781667


I'm having trouble with Pandas' groupby functionality. I've read the documentation, but I can't see to figure out how to apply aggregate functions to multiple columns and have custom names for those columns.

This comes very close, but the data structure returned has nested column headings:

data.groupby("Country").agg(
        {"column1": {"foo": sum()}, "column2": {"mean": np.mean, "std": np.std}})

(ie. I want to take the mean and std of column2, but return those columns as "mean" and "std")

What am I missing?

解决方案

This will drop the outermost level from the hierarchical column index:

df = data.groupby(...).agg(...)
df.columns = df.columns.droplevel(0)

If you'd like to keep the outermost level, you can use the ravel() function on the multi-level column to form new labels:

df.columns = ["_".join(x) for x in df.columns.ravel()]


For example:

import pandas as pd
import pandas.rpy.common as com
import numpy as np

data = com.load_data('Loblolly')
print(data.head())
#     height  age Seed
# 1     4.51    3  301
# 15   10.89    5  301
# 29   28.72   10  301
# 43   41.74   15  301
# 57   52.70   20  301

df = data.groupby('Seed').agg(
    {'age':['sum'],
     'height':['mean', 'std']})
print(df.head())
#       age     height           
#       sum        std       mean
# Seed                           
# 301    78  22.638417  33.246667
# 303    78  23.499706  34.106667
# 305    78  23.927090  35.115000
# 307    78  22.222266  31.328333
# 309    78  23.132574  33.781667

df.columns = df.columns.droplevel(0)
print(df.head())

yields

      sum        std       mean
Seed                           
301    78  22.638417  33.246667
303    78  23.499706  34.106667
305    78  23.927090  35.115000
307    78  22.222266  31.328333
309    78  23.132574  33.781667

Alternatively, to keep the first level of the index:

df = data.groupby('Seed').agg(
    {'age':['sum'],
     'height':['mean', 'std']})
df.columns = ["_".join(x) for x in df.columns.ravel()]

yields

      age_sum   height_std  height_mean
Seed                           
301        78    22.638417    33.246667
303        78    23.499706    34.106667
305        78    23.927090    35.115000
307        78    22.222266    31.328333
309        78    23.132574    33.781667

这篇关于命名Pandas集合函数中的返回列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆