将行追加到Pandas groupby对象 [英] append rows to a Pandas groupby object

查看:114
本文介绍了将行追加到Pandas groupby对象的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试找出将方法重新插入到多索引熊猫数据框中的最佳方法.

I am trying to figure out the best way to insert the means back into a multi-indexed pandas dataframe.

假设我有一个像这样的数据框:

Suppose I have a dataframe like this:

      metric 1     metric 2    
             R   P        R   P
foo a        0   1        2   3
    b        4   5        6   7
bar a        8   9       10  11
    b       12  13       14  15

我想得到以下结果:

      metric 1     metric 2    
             R   P        R   P
foo a        0   1        2   3
    b        4   5        6   7
  AVG        2   3        4   5
bar a        8   9       10  11
    b       12  13       14  15
  AVG       10  11       12  13

请注意,我知道我可以做df.mean(level=0)来将级别0的均值表示为单独的数据帧.这不完全是我想要的-我想将组方式作为行插入到组中.

Please note, I know I can do df.mean(level=0) to get the level 0 group means as a separate dataframe. This is not exactly what I want -- I want to insert the group means as rows back into the group.

我能够得到想要的结果,但是我觉得我做错了/很可能我缺少一个衬板,而没有昂贵的python迭代就已经做到了.这是我的示例代码:

I am able to get the result I want, but I feel like I am doing this wrong/there is probably a one liner that I am missing that already does this without the expensive python iteration. Here is my example code:

import numpy as np
import pandas as pd

data = np.arange(16).reshape(4,4)
row_index = [("foo", "a"), ("foo", "b"), ("bar", "a"), ("bar", "b")]
col_index = [("metric 1", "R"), ("metric 1", "P"), ("metric 2", "R"),  
    ("metric 2", "P")]
col_multiindex = pd.MultiIndex.from_tuples(col_index)
df = pd.DataFrame(data, index=pd.MultiIndex.from_tuples(row_index),
    columns=col_multiindex)

new_row_index = []
data = []
for name, group in df.groupby(level=0):
    for index_tuple, row in group.iterrows():
        new_row_index.append(index_tuple)
        data.append(row.tolist())
    new_row_index.append((name, "AVG"))
    data.append(group.mean().tolist())

print pd.DataFrame(data, 
    index=pd.MultiIndex.from_tuples(new_row_index), 
    columns=col_multiindex)

这将导致:

        metric 1     metric 2    
               R   P        R   P
bar a          8   9       10  11
    b         12  13       14  15
    AVG       10  11       12  13
foo a          0   1        2   3
    b          4   5        6   7
    AVG        2   3        4   5

由于某种原因翻转组的顺序,但是或多或少是我想要的.

which flips the order of the groups for some reason, but is more or less what I want.

推荐答案

此处需要做的主要事情是将您的方法附加到主数据集中.在执行此操作之前,您需要做的主要技巧就是使索引一致(使用reset_index()set_index(),以便在添加索引之后将它们或多或少地排成一行并准备根据相同的键进行排序.

The main thing you need to do here is append your means to the main dataset. The main trick you need before doing that is just to conform the indexes (with the reset_index() and set_index() so that after you append them they will be more or less lined up and ready to sort based on the same keys.

In [35]: df2 = df.groupby(level=0).mean()

In [36]: df2['index2'] = 'AVG'

In [37]: df2 = df2.reset_index().set_index(['index','index2']).append(df).sort()

In [38]: df2
Out[38]: 
             metric 1     metric 2    
                    R   P        R   P
index index2                          
bar   AVG          10  11       12  13
      a             8   9       10  11
      b            12  13       14  15
foo   AVG           2   3        4   5
      a             0   1        2   3
      b             4   5        6   7

就行的排序而言,最好的办法可能只是设置名称,以便排序将它们放在正确的位置(例如A,B,avg).或者对于少量的行,您可以仅使用花式索引:

As far as ordering the rows, the best thing is probably just to set the names so that sorting puts them in the right place (e.g. A,B,avg). Or for a small number of rows you could just use fancy indexing:

In [39]: df2.ix[[4,5,3,1,2,0]]
Out[39]: 
             metric 1     metric 2    
                    R   P        R   P
index index2                          
foo   a             0   1        2   3
      b             4   5        6   7
      AVG           2   3        4   5
bar   a             8   9       10  11
      b            12  13       14  15
      AVG          10  11       12  13

这篇关于将行追加到Pandas groupby对象的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆