从Pandas聚合中重命名结果列("FutureWarning:不建议将dict与重命名一起使用") [英] Rename result columns from Pandas aggregation ("FutureWarning: using a dict with renaming is deprecated")

查看:1640
本文介绍了从Pandas聚合中重命名结果列("FutureWarning:不建议将dict与重命名一起使用")的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试对熊猫数据帧进行一些聚合.这是示例代码:

I'm trying to do some aggregations on a pandas data frame. Here is a sample code:

import pandas as pd

df = pd.DataFrame({"User": ["user1", "user2", "user2", "user3", "user2", "user1"],
                  "Amount": [10.0, 5.0, 8.0, 10.5, 7.5, 8.0]})

df.groupby(["User"]).agg({"Amount": {"Sum": "sum", "Count": "count"}})

Out[1]: 
      Amount      
         Sum Count
User              
user1   18.0     2
user2   20.5     3
user3   10.5     1

哪个会生成以下警告:

FutureWarning:不建议将dict与重命名一起使用,并且会 在以后的版本中删除返回super(DataFrameGroupBy, self).aggregate(arg,* args,** kwargs)

FutureWarning: using a dict with renaming is deprecated and will be removed in a future version return super(DataFrameGroupBy, self).aggregate(arg, *args, **kwargs)

如何避免这种情况?

推荐答案

使用groupby apply并返回系列以重命名列

使用groupby apply方法执行聚合操作

Use groupby apply and return a Series to rename columns

Use the groupby apply method to perform an aggregation that

  • 重命名列
  • 允许在名称中使用空格
  • 允许您以选择的任何方式对返回的列进行排序
  • 允许列之间的交互
  • 返回单级索引而不是MultiIndex

要这样做:

  • 创建传递给apply
  • 的自定义函数
  • 此自定义函数作为数据帧传递给每个组
  • 返回系列
  • 系列的索引将是新列
  • create a custom function that you pass to apply
  • This custom function is passed each group as a DataFrame
  • Return a Series
  • The index of the Series will be the new columns

创建虚假数据

df = pd.DataFrame({"User": ["user1", "user2", "user2", "user3", "user2", "user1", "user3"],
                  "Amount": [10.0, 5.0, 8.0, 10.5, 7.5, 8.0, 9],
                  'Score': [9, 1, 8, 7, 7, 6, 9]})

创建返回系列的自定义函数
my_agg内部的变量x是一个DataFrame

create custom function that returns a Series
The variable x inside of my_agg is a DataFrame

def my_agg(x):
    names = {
        'Amount mean': x['Amount'].mean(),
        'Amount std':  x['Amount'].std(),
        'Amount range': x['Amount'].max() - x['Amount'].min(),
        'Score Max':  x['Score'].max(),
        'Score Sum': x['Score'].sum(),
        'Amount Score Sum': (x['Amount'] * x['Score']).sum()}

    return pd.Series(names, index=['Amount range', 'Amount std', 'Amount mean',
                                   'Score Sum', 'Score Max', 'Amount Score Sum'])

将此自定义功能传递给groupby apply方法

Pass this custom function to the groupby apply method

df.groupby('User').apply(my_agg)

最大的缺点是,对于

The big downside is that this function will be much slower than agg for the cythonized aggregations

由于字典的复杂性和某些模棱两可的性质,删除了使用字典的字典.关于如何改进此功能的正在进行的讨论. github上的future在这里,您可以在groupby调用之后直接访问聚合列.只需传递您希望应用的所有汇总功能的列表即可.

Using a dictionary of dictionaries was removed because of its complexity and somewhat ambiguous nature. There is an ongoing discussion on how to improve this functionality in the future on github Here, you can directly access the aggregating column after the groupby call. Simply pass a list of all the aggregating functions you wish to apply.

df.groupby('User')['Amount'].agg(['sum', 'count'])

输出

       sum  count
User              
user1  18.0      2
user2  20.5      3
user3  10.5      1

仍然可以使用字典来显式表示不同列的不同聚合,例如此处是否存在另一个名为Other的数字列.

It is still possible to use a dictionary to explicitly denote different aggregations for different columns, like here if there was another numeric column named Other.

df = pd.DataFrame({"User": ["user1", "user2", "user2", "user3", "user2", "user1"],
              "Amount": [10.0, 5.0, 8.0, 10.5, 7.5, 8.0],
              'Other': [1,2,3,4,5,6]})

df.groupby('User').agg({'Amount' : ['sum', 'count'], 'Other':['max', 'std']})

输出

      Amount       Other          
         sum count   max       std
User                              
user1   18.0     2     6  3.535534
user2   20.5     3     5  1.527525
user3   10.5     1     4       NaN

这篇关于从Pandas聚合中重命名结果列("FutureWarning:不建议将dict与重命名一起使用")的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆