从Pandas聚合中重命名结果列("FutureWarning:不建议将dict与重命名一起使用") [英] Rename result columns from Pandas aggregation ("FutureWarning: using a dict with renaming is deprecated")
问题描述
我正在尝试对熊猫数据帧进行一些聚合.这是示例代码:
I'm trying to do some aggregations on a pandas data frame. Here is a sample code:
import pandas as pd
df = pd.DataFrame({"User": ["user1", "user2", "user2", "user3", "user2", "user1"],
"Amount": [10.0, 5.0, 8.0, 10.5, 7.5, 8.0]})
df.groupby(["User"]).agg({"Amount": {"Sum": "sum", "Count": "count"}})
Out[1]:
Amount
Sum Count
User
user1 18.0 2
user2 20.5 3
user3 10.5 1
哪个会生成以下警告:
FutureWarning:不建议将dict与重命名一起使用,并且会 在以后的版本中删除返回super(DataFrameGroupBy, self).aggregate(arg,* args,** kwargs)
FutureWarning: using a dict with renaming is deprecated and will be removed in a future version return super(DataFrameGroupBy, self).aggregate(arg, *args, **kwargs)
如何避免这种情况?
推荐答案
使用groupby apply
并返回系列以重命名列
使用groupby apply
方法执行聚合操作
Use groupby apply
and return a Series to rename columns
Use the groupby apply
method to perform an aggregation that
- 重命名列
- 允许在名称中使用空格
- 允许您以选择的任何方式对返回的列进行排序
- 允许列之间的交互
- 返回单级索引而不是MultiIndex
要这样做:
- 创建传递给
apply
的自定义函数
- 此自定义函数作为数据帧传递给每个组
- 返回系列
- 系列的索引将是新列
- create a custom function that you pass to
apply
- This custom function is passed each group as a DataFrame
- Return a Series
- The index of the Series will be the new columns
创建虚假数据
df = pd.DataFrame({"User": ["user1", "user2", "user2", "user3", "user2", "user1", "user3"],
"Amount": [10.0, 5.0, 8.0, 10.5, 7.5, 8.0, 9],
'Score': [9, 1, 8, 7, 7, 6, 9]})
创建返回系列的自定义函数
my_agg
内部的变量x
是一个DataFrame
create custom function that returns a Series
The variable x
inside of my_agg
is a DataFrame
def my_agg(x):
names = {
'Amount mean': x['Amount'].mean(),
'Amount std': x['Amount'].std(),
'Amount range': x['Amount'].max() - x['Amount'].min(),
'Score Max': x['Score'].max(),
'Score Sum': x['Score'].sum(),
'Amount Score Sum': (x['Amount'] * x['Score']).sum()}
return pd.Series(names, index=['Amount range', 'Amount std', 'Amount mean',
'Score Sum', 'Score Max', 'Amount Score Sum'])
将此自定义功能传递给groupby apply
方法
Pass this custom function to the groupby apply
method
df.groupby('User').apply(my_agg)
The big downside is that this function will be much slower than agg
for the cythonized aggregations
由于字典的复杂性和某些模棱两可的性质,删除了使用字典的字典.关于如何改进此功能的正在进行的讨论. github上的future在这里,您可以在groupby调用之后直接访问聚合列.只需传递您希望应用的所有汇总功能的列表即可.
Using a dictionary of dictionaries was removed because of its complexity and somewhat ambiguous nature. There is an ongoing discussion on how to improve this functionality in the future on github Here, you can directly access the aggregating column after the groupby call. Simply pass a list of all the aggregating functions you wish to apply.
df.groupby('User')['Amount'].agg(['sum', 'count'])
输出
sum count
User
user1 18.0 2
user2 20.5 3
user3 10.5 1
仍然可以使用字典来显式表示不同列的不同聚合,例如此处是否存在另一个名为Other
的数字列.
It is still possible to use a dictionary to explicitly denote different aggregations for different columns, like here if there was another numeric column named Other
.
df = pd.DataFrame({"User": ["user1", "user2", "user2", "user3", "user2", "user1"],
"Amount": [10.0, 5.0, 8.0, 10.5, 7.5, 8.0],
'Other': [1,2,3,4,5,6]})
df.groupby('User').agg({'Amount' : ['sum', 'count'], 'Other':['max', 'std']})
输出
Amount Other
sum count max std
User
user1 18.0 2 6 3.535534
user2 20.5 3 5 1.527525
user3 10.5 1 4 NaN
这篇关于从Pandas聚合中重命名结果列("FutureWarning:不建议将dict与重命名一起使用")的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!