pandas 按组聚合和列排序 [英] Pandas sort by group aggregate and column

查看：157 发布时间：2017/3/25 22:50:50 python sorting group-by dataframe pandas

本文介绍了 pandas 按组聚合和列排序的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

给定以下数据框

 在[31]中：rand = np.random.RandomState（1）
 df = pd.DataFrame（{'A'：['foo'，'bar'，'baz'] * 2，
'B'：rand.randn（6），
'C' rand.rand（6）> .5}）
 
在[32]中：df 
输出[32]：ABC 
 0 foo 1.624345 False 
 1 bar -0.611756 True 
 2 baz -0.528172 False 
 3 foo -1.072969 True 
 4 bar 0.865408 False 
 5 baz -2.301539 True 
  / pre> 
 
 我想按小组（ A ）按 B ，然后按 C （未聚合）中的值。所以基本上得到 A 组的顺序与
 在[28 ]：df.groupby（'A'）sum（）。sort（'B'）
 Out [28]：BC 
 A 
 baz -2.829710 1 
 bar 0.253651 1 
 foo 0.551377 1 
  
然后按True / False，这样它最终看起来像这个：
 在[30]中：df.ix [[5，2，1，4，3，0]] 
出[30]：ABC 
 5 baz -2.301539 True 
 2 baz -0.528172 False 
 1 bar -0.611756 True 
 4 bar 0.865408 False 
 3 foo -1.072969 True 
 0 foo 1.624345 False 
  
如何做到这一点？ 
解决方案
 Groupby A：
 在[0]中：grp = df.groupby（'A'）
  
对B进行求和，并使用transform广播数值。然后按B排序：
 在[1]中：grp [['B']]。transform（sum）.sort 'B'）
出[1]：
 B 
 2 -2.829710 
 5 -2.829710 
 1 0.253651 
 4 0.253651 
 0 0.551377 
 3 0.551377 
  
通过从上方传递索引来索引原始的df。这将按照B值的总和重新排列A值：
 在[2]中：sort1 = df。 ix [grp [['B']]。transform（sum）.sort（'B'）。index] 
 
在[3]中：sort1 
输出[3]：
 ABC 
 2 baz -0.528172 False 
 5 baz -2.301539 True 
 1 bar -0.611756 True 
 4 bar 0.865408 False 
 0 foo 1.624345 False 
 3 foo -1.072969 True 
  
最后，使用A  sort = False 选项以保留步骤1中的排序顺序：
  
在[5]中：sort2 = sort1.groupby（'A'，sort = False） 。（f）
 
在[6]中：sort2 
输出[6]：
 ABC 
 A 
 baz 5 baz -2.301539 True 
 2 baz -0.528172 False 
 bar 1 bar -0.611756 True 
 4 bar 0.865408 False 
 foo 3 foo -1.072969 True 
 0 foo 1 .624345 False 
  
使用清除df索引reset_index  with  drop = True ：
 在[7]中： sort2.reset_index（0，drop = True）
输出[7]：
 ABC 
 5 baz -2.301539 True 
 2 baz -0.528172 False 
 1 bar -0.611756 True 
 4 bar 0.865408 False 
 3 foo -1.072969 True 
 0 foo 1.624345 False 
  
 
Given the following dataframe
In [31]: rand = np.random.RandomState(1)
         df = pd.DataFrame({'A': ['foo', 'bar', 'baz'] * 2,
                            'B': rand.randn(6),
                            'C': rand.rand(6) > .5})

In [32]: df
Out[32]:      A         B      C
         0  foo  1.624345  False
         1  bar -0.611756   True
         2  baz -0.528172  False
         3  foo -1.072969   True
         4  bar  0.865408  False
         5  baz -2.301539   True 
I would like to sort it in groups (A) by the aggregated sum of B, and then by the value in C (not aggregated). So basically get the order of the A groups with
In [28]: df.groupby('A').sum().sort('B')
Out[28]:             B  C
         A               
         baz -2.829710  1
         bar  0.253651  1
         foo  0.551377  1
And then by True/False, so that it ultimately looks like this:
In [30]: df.ix[[5, 2, 1, 4, 3, 0]]
Out[30]: A         B      C
    5  baz -2.301539   True
    2  baz -0.528172  False
    1  bar -0.611756   True
    4  bar  0.865408  False
    3  foo -1.072969   True
    0  foo  1.624345  False
How can this be done?                         
 解决方案 
Groupby A:
In [0]: grp = df.groupby('A')
Within each group, sum over B and broadcast the values using transform.  Then sort by B:
In [1]: grp[['B']].transform(sum).sort('B')
Out[1]:
          B
2 -2.829710
5 -2.829710
1  0.253651
4  0.253651
0  0.551377
3  0.551377
Index the original df by passing the index from above.  This will re-order the A values by the aggregate sum of the B values:
In [2]: sort1 = df.ix[grp[['B']].transform(sum).sort('B').index]

In [3]: sort1
Out[3]:
     A         B      C
2  baz -0.528172  False
5  baz -2.301539   True
1  bar -0.611756   True
4  bar  0.865408  False
0  foo  1.624345  False
3  foo -1.072969   True
Finally, sort the 'C' values within groups of 'A' using the sort=False option to preserve the A sort order from step 1:
In [4]: f = lambda x: x.sort('C', ascending=False)

In [5]: sort2 = sort1.groupby('A', sort=False).apply(f)

In [6]: sort2
Out[6]:
         A         B      C
A
baz 5  baz -2.301539   True
    2  baz -0.528172  False
bar 1  bar -0.611756   True
    4  bar  0.865408  False
foo 3  foo -1.072969   True
    0  foo  1.624345  False
Clean up the df index by using reset_index with drop=True:
In [7]: sort2.reset_index(0, drop=True)
Out[7]:
     A         B      C
5  baz -2.301539   True
2  baz -0.528172  False
1  bar -0.611756   True
4  bar  0.865408  False
3  foo -1.072969   True
0  foo  1.624345  False


                        
这篇关于 pandas 按组聚合和列排序的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

pandas 按组聚合和列排序 [英] Pandas sort by group aggregate and column

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

pandas 按组聚合和列排序 [英] Pandas sort by group aggregate and column

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭