Python Pandas Groupby聚合在多个列上,然后进行透视 [英] Python pandas groupby aggregate on multiple columns, then pivot

查看:201
本文介绍了Python Pandas Groupby聚合在多个列上,然后进行透视的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在Python中,我有一个类似于以下内容的pandas DataFrame:

In Python, I have a pandas DataFrame similar to the following:

Item | shop1 | shop2 | shop3 | Category
------------------------------------
Shoes| 45    | 50    | 53    | Clothes
TV   | 200   | 300   | 250   | Technology
Book | 20    | 17    | 21    | Books
phone| 300   | 350   | 400   | Technology

其中shop1,shop2和shop3是不同商店中每个项目的成本. 现在,经过一些数据清理后,我需要返回一个DataFrame,如下所示:

Where shop1, shop2 and shop3 are the costs of every item in different shops. Now, I need to return a DataFrame, after some data cleaning, like this one:

Category (index)| size| sum| mean | std
----------------------------------------

其中,大小是每个类别中的项目数,总和,均值和标准差与应用于3家商店的相同功能相关.我该如何使用split-apply-combine模式(groupby,aggregate,apply ...)进行这些操作?

where size is the number of items in each Category and sum, mean and std are related to the same functions applied to the 3 shops. How can I do these operations with the split-apply-combine pattern (groupby, aggregate, apply,...) ?

有人可以帮我吗?我要疯了……谢谢!

Can someone help me out? I'm going crazy with this one...thank you!

推荐答案

针对Pandas 0.22+版本进行了编辑,考虑到不建议通过聚合来使用组中的字典.

Edited for Pandas 0.22+ considering the deprecation of the use of dictionaries in a group by aggregation.

我们建立了一个非常相似的字典,在其中我们使用字典的键来指定我们的函数,并使用字典本身来重命名列.

We set up a very similar dictionary where we use the keys of the dictionary to specify our functions and the dictionary itself to rename the columns.

rnm_cols = dict(size='Size', sum='Sum', mean='Mean', std='Std')
df.set_index(['Category', 'Item']).stack().groupby('Category') \
  .agg(rnm_cols.keys()).rename(columns=rnm_cols)

            Size   Sum        Mean        Std
Category                                     
Books          3    58   19.333333   2.081666
Clothes        3   148   49.333333   4.041452
Technology     6  1800  300.000000  70.710678


选项1
使用 agg ←链接到文档


option 1
use agg ← link to docs

agg_funcs = dict(Size='size', Sum='sum', Mean='mean', Std='std')
df.set_index(['Category', 'Item']).stack().groupby(level=0).agg(agg_funcs)

                  Std   Sum        Mean  Size
Category                                     
Books        2.081666    58   19.333333     3
Clothes      4.041452   148   49.333333     3
Technology  70.710678  1800  300.000000     6

选项2
花更少的钱
使用 describe ←链接到文档

option 2
more for less
use describe ← link to docs

df.set_index(['Category', 'Item']).stack().groupby(level=0).describe().unstack()

            count        mean        std    min    25%    50%    75%    max
Category                                                                   
Books         3.0   19.333333   2.081666   17.0   18.5   20.0   20.5   21.0
Clothes       3.0   49.333333   4.041452   45.0   47.5   50.0   51.5   53.0
Technology    6.0  300.000000  70.710678  200.0  262.5  300.0  337.5  400.0

这篇关于Python Pandas Groupby聚合在多个列上,然后进行透视的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆