按 pandas 分组分组 [英] Breaking out column by groups in Pandas

查看:64
本文介绍了按 pandas 分组分组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果我有一个这样的DataFrame:

If I have a DataFrame like this:

 type   value   group
    a      10     one
    b      45     one
    a     224     two
    b     119     two
    a      33   three
    b      44   three

我如何做到这一点:

 type     one     two   three
    a      10     224      33
    b      45     119      44

我以为是pivot_table,但这只是给我重新分组的列表.

I thought it'd be pivot_table, but that just gives me a re-grouped list.

推荐答案

我认为您需要

I think you need pivot with rename_axis (new in pandas 0.18.0) and reset_index:

print df.pivot(index='type', columns='group', values='value')
        .rename_axis(None, axis=1)
        .reset_index()

  type  one  three  two
0    a   10     33  224
1    b   45     44  119

如果列的顺序很重要:

df = df.pivot(index='type', columns='group', values='value').rename_axis(None, axis=1)

print df[['one','two','three']].reset_index()
  type  one  two  three
0    a   10  224     33
1    b   45  119     44

在真实数据中,您会得到错误:

In your real data you can get error:

print df.pivot(index='type', columns='group', values='value')
        .rename_axis(None, axis=1)
        .reset_index()

ValueError:索引包含重复的条目,无法重塑

ValueError: Index contains duplicate entries, cannot reshape

print df
  type  value  group
0    a     10    one
1    a     20    one
2    b     45    one
3    a    224    two
4    b    119    two
5    a     33  three
6    b     44  three

问题在第二行-您获得索引值a和列one的两个值-1020.在这种情况下,函数 pivot_table 会汇总数据. Dafault聚合功能是np.mean,但是您可以通过参数aggfunc对其进行更改:

Problem is in second row - you get for index value a and column one two values - 10 and 20. Function pivot_table aggregate data in this case. Dafault aggregating function is np.mean, but you can change it by parameter aggfunc:

print df.pivot_table(index='type', columns='group', values='value', aggfunc=np.mean)
        .rename_axis(None, axis=1)
        .reset_index()

  type  one  three  two
0    a   15     33  224
1    b   45     44  119

print df.pivot_table(index='type', columns='group', values='value', aggfunc='first')
        .rename_axis(None, axis=1)
        .reset_index()

  type  one  three  two
0    a   10     33  224
1    b   45     44  119

print df.pivot_table(index='type', columns='group', values='value', aggfunc=sum)
        .rename_axis(None, axis=1)
        .reset_index()

  type  one  three  two
0    a   30     33  224
1    b   45     44  119

这篇关于按 pandas 分组分组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆