按两列(或更多列)将pandas dataframe分组? [英] grouping pandas dataframe by two columns (or more)?

查看:178
本文介绍了按两列(或更多列)将pandas dataframe分组?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下数据框:

mydf = pandas.DataFrame({"cat": ["first", "first", "first", "second", "second", "third"], "class": ["A", "A", "A", "B", "B", "C"], "name": ["a1", "a2", "a3", "b1", "b2", "c1"], "val": [1,5,1,1,2,10]})

我想创建一个数据框,以对具有相同class id的项的val列进行摘要统计.为此,我使用groupby如下:

I want to create a dataframe that makes summary statistics about the val column of items with the same class id. For this I use groupby as follows:

mydf.groupby("class").val.sum()

这是正确的行为,但我想在生成的df中保留cat列信息.可以做到吗?以后必须要merge/join该信息吗?我试过了:

that's the correct behavior, but I'd like to retain the cat column information in the resulting df. can that be done? do I have to merge/join that info in later? I tried:

mydf.groupby(["cat", "class"]).val.sum()

但是使用分层索引.我想返回一个普通的数据框,该数据框仅对每个组具有cat值,其中group by为class.输出应为具有cat和class值的数据帧(非序列),其中val条目是对具有相同class:

but this uses hierarchical indexing. I'd like to have a plain dataframe back that just has the cat value for each group, where the group by is class. The output should be a dataframe (not series) with the values of cat and class, where the val entries are summed over each entry that has the same class:

cat     class    val
first   A         7
second  B         3
third   C        10

这可能吗?

推荐答案

使用reset_index

In [9]: mydf.groupby(['cat', "class"]).val.sum().reset_index()
Out[9]: 
      cat class  val
0   first     A    7
1  second     B    3
2   third     C   10

编辑

如果要将cat设置为索引,则

set level = 1

EDIT

set level=1 if you want to set cat as index

In [10]: mydf.groupby(['cat', "class"]).val.sum().reset_index(level=1)
Out[10]: 
       class  val
cat              
first      A    7
second     B    3
third      C   10

您还可以设置as_index=False以获得相同的输出

You can also set as_index=False to get the same output

In [29]: mydf.groupby(['cat', "class"], as_index=False).val.sum()
Out[29]: 
      cat class  val
0   first     A    7
1  second     B    3
2   third     C   10

这篇关于按两列(或更多列)将pandas dataframe分组?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆