将行追加到Pandas groupby对象 [英] append rows to a Pandas groupby object
问题描述
我正在尝试找出将方法重新插入到多索引熊猫数据框中的最佳方法.
I am trying to figure out the best way to insert the means back into a multi-indexed pandas dataframe.
假设我有一个像这样的数据框:
Suppose I have a dataframe like this:
metric 1 metric 2
R P R P
foo a 0 1 2 3
b 4 5 6 7
bar a 8 9 10 11
b 12 13 14 15
我想得到以下结果:
metric 1 metric 2
R P R P
foo a 0 1 2 3
b 4 5 6 7
AVG 2 3 4 5
bar a 8 9 10 11
b 12 13 14 15
AVG 10 11 12 13
请注意,我知道我可以做df.mean(level=0)
来将级别0的均值表示为单独的数据帧.这不完全是我想要的-我想将组方式作为行插入到组中.
Please note, I know I can do df.mean(level=0)
to get the level 0 group means as a separate dataframe. This is not exactly what I want -- I want to insert the group means as rows back into the group.
我能够得到想要的结果,但是我觉得我做错了/很可能我缺少一个衬板,而没有昂贵的python迭代就已经做到了.这是我的示例代码:
I am able to get the result I want, but I feel like I am doing this wrong/there is probably a one liner that I am missing that already does this without the expensive python iteration. Here is my example code:
import numpy as np
import pandas as pd
data = np.arange(16).reshape(4,4)
row_index = [("foo", "a"), ("foo", "b"), ("bar", "a"), ("bar", "b")]
col_index = [("metric 1", "R"), ("metric 1", "P"), ("metric 2", "R"),
("metric 2", "P")]
col_multiindex = pd.MultiIndex.from_tuples(col_index)
df = pd.DataFrame(data, index=pd.MultiIndex.from_tuples(row_index),
columns=col_multiindex)
new_row_index = []
data = []
for name, group in df.groupby(level=0):
for index_tuple, row in group.iterrows():
new_row_index.append(index_tuple)
data.append(row.tolist())
new_row_index.append((name, "AVG"))
data.append(group.mean().tolist())
print pd.DataFrame(data,
index=pd.MultiIndex.from_tuples(new_row_index),
columns=col_multiindex)
这将导致:
metric 1 metric 2
R P R P
bar a 8 9 10 11
b 12 13 14 15
AVG 10 11 12 13
foo a 0 1 2 3
b 4 5 6 7
AVG 2 3 4 5
由于某种原因翻转组的顺序,但是或多或少是我想要的.
which flips the order of the groups for some reason, but is more or less what I want.
推荐答案
此处需要做的主要事情是将您的方法附加到主数据集中.在执行此操作之前,您需要做的主要技巧就是使索引一致(使用reset_index()
和set_index()
,以便在添加索引之后将它们或多或少地排成一行并准备根据相同的键进行排序.>
The main thing you need to do here is append your means to the main dataset. The main trick you need before doing that is just to conform the indexes (with the reset_index()
and set_index()
so that after you append them they will be more or less lined up and ready to sort based on the same keys.
In [35]: df2 = df.groupby(level=0).mean()
In [36]: df2['index2'] = 'AVG'
In [37]: df2 = df2.reset_index().set_index(['index','index2']).append(df).sort()
In [38]: df2
Out[38]:
metric 1 metric 2
R P R P
index index2
bar AVG 10 11 12 13
a 8 9 10 11
b 12 13 14 15
foo AVG 2 3 4 5
a 0 1 2 3
b 4 5 6 7
就行的排序而言,最好的办法可能只是设置名称,以便排序将它们放在正确的位置(例如A,B,avg).或者对于少量的行,您可以仅使用花式索引:
As far as ordering the rows, the best thing is probably just to set the names so that sorting puts them in the right place (e.g. A,B,avg). Or for a small number of rows you could just use fancy indexing:
In [39]: df2.ix[[4,5,3,1,2,0]]
Out[39]:
metric 1 metric 2
R P R P
index index2
foo a 0 1 2 3
b 4 5 6 7
AVG 2 3 4 5
bar a 8 9 10 11
b 12 13 14 15
AVG 10 11 12 13
这篇关于将行追加到Pandas groupby对象的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!