python pandas group by和聚合列 [英] python pandas group by and aggregate columns
问题描述
我正在使用熊猫版本0.23.0.我想使用按功能分组的数据框,以使用[lambda]函数生成新的聚合列.
I am using panda version 0.23.0. I want to use data frame group by function to generate new aggregated columns using [lambda] functions..
我的数据框看起来像
ID Flag Amount User
1 1 100 123345
1 1 55 123346
2 0 20 123346
2 0 30 123347
3 0 50 123348
我想生成一个看起来像这样的表
I want to generate a table which looks like
ID Flag0_Count Flag1_Count Flag0_Amount_SUM Flag1_Amount_SUM Flag0_User_Count Flag1_User_Count
1 2 2 0 155 0 2
2 2 0 50 0 2 0
3 1 0 50 0 1 0
此处:
- Flag0_Count是标志= 0的计数
- Flag1_Count是标志= 1的计数
- Flag0_Amount_SUM是Flag = 0时金额的SUNM
- Flag1_Amount_SUM是Flag = 1时金额的SUNM
- Flag0_User_Count是标志= 0时的不同用户计数
- Flag1_User_Count是标志= 1时的不同用户计数
我尝试过类似的事情
df.groupby(["ID"])["Flag"].apply(lambda x: sum(x==0)).reset_index()
,但是会创建一个新的新数据框.这意味着我将对所有列都必须这样做,并将它们合并到一个新的数据框中.有更简单的方法可以做到这一点吗?
but it creates a new a new data frame. This means I will have to this for all columns and them merge them together into a new data frame. Is there an easier way to accomplish this?
推荐答案
使用 unstack
,展平 MultiIndex
列,重命名
列和最后 重置索引
:
Use DataFrameGroupBy.agg
by dictionary by column names with aggregate function, then reshape by unstack
, flatten MultiIndex
of columns, rename
columns and last reset_index
:
df = (df.groupby(["ID", "Flag"])
.agg({'Flag':'size', 'Amount':'sum', 'User':'nunique'})
.unstack(fill_value=0))
#python 3.6+
df.columns = [f'{i}{j}' for i, j in df.columns]
#python bellow
#df.columns = [f'{}{}'.format(i, j) for i, j in df.columns]
d = {'Flag0':'Flag0_Count',
'Flag1':'Flag1_Count',
'Amount0':'Flag0_Amount_SUM',
'Amount1':'Flag1_Amount_SUM',
'User0':'Flag0_User_Count',
'User1':'Flag1_User_Count',
}
df = df.rename(columns=d).reset_index()
print (df)
ID Flag0_Count Flag1_Count Flag0_Amount_SUM Flag1_Amount_SUM \
0 1 0 2 0 155
1 2 2 0 50 0
2 3 1 0 50 0
Flag0_User_Count Flag1_User_Count
0 0 2
1 2 0
2 1 0
这篇关于python pandas group by和聚合列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!