pandas 按组聚合和列排序 [英] Pandas sort by group aggregate and column
问题描述
给定以下数据框
在[31]中:rand = np.random.RandomState(1)
/ pre>
df = pd.DataFrame({'A':['foo','bar','baz'] * 2,
'B':rand.randn(6),
'C' rand.rand(6)> .5})
在[32]中:df
输出[32]:ABC
0 foo 1.624345 False
1 bar -0.611756 True
2 baz -0.528172 False
3 foo -1.072969 True
4 bar 0.865408 False
5 baz -2.301539 True
我想按小组(
A
)按B
,然后按C
(未聚合)中的值。所以基本上得到A
组的顺序与在[28 ]:df.groupby('A')sum()。sort('B')
Out [28]:BC
A
baz -2.829710 1
bar 0.253651 1
foo 0.551377 1
然后按True / False,这样它最终看起来像这个:
在[30]中:df.ix [[5,2,1,4,3,0]]
出[30]:ABC
5 baz -2.301539 True
2 baz -0.528172 False
1 bar -0.611756 True
4 bar 0.865408 False
3 foo -1.072969 True
0 foo 1.624345 False
如何做到这一点?
解决方案Groupby A:
在[0]中:grp = df.groupby('A')
对B进行求和,并使用transform广播数值。然后按B排序:
在[1]中:grp [['B']]。transform(sum).sort 'B')
出[1]:
B
2 -2.829710
5 -2.829710
1 0.253651
4 0.253651
0 0.551377
3 0.551377
通过从上方传递索引来索引原始的df。这将按照B值的总和重新排列A值:
在[2]中:sort1 = df。 ix [grp [['B']]。transform(sum).sort('B')。index]
在[3]中:sort1
输出[3]:
ABC
2 baz -0.528172 False
5 baz -2.301539 True
1 bar -0.611756 True
4 bar 0.865408 False
0 foo 1.624345 False
3 foo -1.072969 True
最后,使用A
sort = False
选项以保留步骤1中的排序顺序:
在[5]中:sort2 = sort1.groupby('A',sort = False) 。(f)
在[6]中:sort2
输出[6]:
ABC
A
baz 5 baz -2.301539 True
2 baz -0.528172 False
bar 1 bar -0.611756 True
4 bar 0.865408 False
foo 3 foo -1.072969 True
0 foo 1 .624345 False
使用
清除df索引reset_index
withdrop = True
:在[7]中: sort2.reset_index(0,drop = True)
输出[7]:
ABC
5 baz -2.301539 True
2 baz -0.528172 False
1 bar -0.611756 True
4 bar 0.865408 False
3 foo -1.072969 True
0 foo 1.624345 False
Given the following dataframe
In [31]: rand = np.random.RandomState(1) df = pd.DataFrame({'A': ['foo', 'bar', 'baz'] * 2, 'B': rand.randn(6), 'C': rand.rand(6) > .5}) In [32]: df Out[32]: A B C 0 foo 1.624345 False 1 bar -0.611756 True 2 baz -0.528172 False 3 foo -1.072969 True 4 bar 0.865408 False 5 baz -2.301539 True
I would like to sort it in groups (
A
) by the aggregated sum ofB
, and then by the value inC
(not aggregated). So basically get the order of theA
groups withIn [28]: df.groupby('A').sum().sort('B') Out[28]: B C A baz -2.829710 1 bar 0.253651 1 foo 0.551377 1
And then by True/False, so that it ultimately looks like this:
In [30]: df.ix[[5, 2, 1, 4, 3, 0]] Out[30]: A B C 5 baz -2.301539 True 2 baz -0.528172 False 1 bar -0.611756 True 4 bar 0.865408 False 3 foo -1.072969 True 0 foo 1.624345 False
How can this be done?
解决方案Groupby A:
In [0]: grp = df.groupby('A')
Within each group, sum over B and broadcast the values using transform. Then sort by B:
In [1]: grp[['B']].transform(sum).sort('B') Out[1]: B 2 -2.829710 5 -2.829710 1 0.253651 4 0.253651 0 0.551377 3 0.551377
Index the original df by passing the index from above. This will re-order the A values by the aggregate sum of the B values:
In [2]: sort1 = df.ix[grp[['B']].transform(sum).sort('B').index] In [3]: sort1 Out[3]: A B C 2 baz -0.528172 False 5 baz -2.301539 True 1 bar -0.611756 True 4 bar 0.865408 False 0 foo 1.624345 False 3 foo -1.072969 True
Finally, sort the 'C' values within groups of 'A' using the
sort=False
option to preserve the A sort order from step 1:In [4]: f = lambda x: x.sort('C', ascending=False) In [5]: sort2 = sort1.groupby('A', sort=False).apply(f) In [6]: sort2 Out[6]: A B C A baz 5 baz -2.301539 True 2 baz -0.528172 False bar 1 bar -0.611756 True 4 bar 0.865408 False foo 3 foo -1.072969 True 0 foo 1.624345 False
Clean up the df index by using
reset_index
withdrop=True
:In [7]: sort2.reset_index(0, drop=True) Out[7]: A B C 5 baz -2.301539 True 2 baz -0.528172 False 1 bar -0.611756 True 4 bar 0.865408 False 3 foo -1.072969 True 0 foo 1.624345 False
这篇关于 pandas 按组聚合和列排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!