pandas 按组聚合和列排序 [英] Pandas sort by group aggregate and column

查看:157
本文介绍了 pandas 按组聚合和列排序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

给定以下数据框

 在[31]中:rand = np.random.RandomState(1)
df = pd.DataFrame({'A':['foo','bar','baz'] * 2,
'B':rand.randn(6),
'C' rand.rand(6)> .5})

在[32]中:df
输出[32]:ABC
0 foo 1.624345 False
1 bar -0.611756 True
2 baz -0.528172 False
3 foo -1.072969 True
4 bar 0.865408 False
5 baz -2.301539 True
/ pre>

我想按小组( A )按 B ,然后按 C (未聚合)中的值。所以基本上得到 A 组的顺序与

 在[28 ]:df.groupby('A')sum()。sort('B')
Out [28]:BC
A
baz -2.829710 1
bar 0.253651 1
foo 0.551377 1

然后按True / False,这样它最终看起来像这个:

 在[30]中:df.ix [[5,2,1,4,3,0]] 
出[30]:ABC
5 baz -2.301539 True
2 baz -0.528172 False
1 bar -0.611756 True
4 bar 0.865408 False
3 foo -1.072969 True
0 foo 1.624345 False

如何做到这一点?

解决方案

Groupby A:

 在[0]中:grp = df.groupby('A')

对B进行求和,并使用transform广播数值。然后按B排序:

 在[1]中:grp [['B']]。transform(sum).sort 'B')
出[1]:
B
2 -2.829710
5 -2.829710
1 0.253651
4 0.253651
0 0.551377
3 0.551377

通过从上方传递索引来索引原始的df。这将按照B值的总和重新排列A值:

 在[2]中:sort1 = df。 ix [grp [['B']]。transform(sum).sort('B')。index] 

在[3]中:sort1
输出[3]:
ABC
2 baz -0.528172 False
5 baz -2.301539 True
1 bar -0.611756 True
4 bar 0.865408 False
0 foo 1.624345 False
3 foo -1.072969 True

最后,使用A sort = False 选项以保留步骤1中的排序顺序:

  
在[5]中:sort2 = sort1.groupby('A',sort = False) 。(f)

在[6]中:sort2
输出[6]:
ABC
A
baz 5 baz -2.301539 True
2 baz -0.528172 False
bar 1 bar -0.611756 True
4 bar 0.865408 False
foo 3 foo -1.072969 True
0 foo 1 .624345 False

使用清除df索引reset_index with drop = True

 在[7]中: sort2.reset_index(0,drop = True)
输出[7]:
ABC
5 baz -2.301539 True
2 baz -0.528172 False
1 bar -0.611756 True
4 bar 0.865408 False
3 foo -1.072969 True
0 foo 1.624345 False


Given the following dataframe

In [31]: rand = np.random.RandomState(1)
         df = pd.DataFrame({'A': ['foo', 'bar', 'baz'] * 2,
                            'B': rand.randn(6),
                            'C': rand.rand(6) > .5})

In [32]: df
Out[32]:      A         B      C
         0  foo  1.624345  False
         1  bar -0.611756   True
         2  baz -0.528172  False
         3  foo -1.072969   True
         4  bar  0.865408  False
         5  baz -2.301539   True 

I would like to sort it in groups (A) by the aggregated sum of B, and then by the value in C (not aggregated). So basically get the order of the A groups with

In [28]: df.groupby('A').sum().sort('B')
Out[28]:             B  C
         A               
         baz -2.829710  1
         bar  0.253651  1
         foo  0.551377  1

And then by True/False, so that it ultimately looks like this:

In [30]: df.ix[[5, 2, 1, 4, 3, 0]]
Out[30]: A         B      C
    5  baz -2.301539   True
    2  baz -0.528172  False
    1  bar -0.611756   True
    4  bar  0.865408  False
    3  foo -1.072969   True
    0  foo  1.624345  False

How can this be done?

解决方案

Groupby A:

In [0]: grp = df.groupby('A')

Within each group, sum over B and broadcast the values using transform. Then sort by B:

In [1]: grp[['B']].transform(sum).sort('B')
Out[1]:
          B
2 -2.829710
5 -2.829710
1  0.253651
4  0.253651
0  0.551377
3  0.551377

Index the original df by passing the index from above. This will re-order the A values by the aggregate sum of the B values:

In [2]: sort1 = df.ix[grp[['B']].transform(sum).sort('B').index]

In [3]: sort1
Out[3]:
     A         B      C
2  baz -0.528172  False
5  baz -2.301539   True
1  bar -0.611756   True
4  bar  0.865408  False
0  foo  1.624345  False
3  foo -1.072969   True

Finally, sort the 'C' values within groups of 'A' using the sort=False option to preserve the A sort order from step 1:

In [4]: f = lambda x: x.sort('C', ascending=False)

In [5]: sort2 = sort1.groupby('A', sort=False).apply(f)

In [6]: sort2
Out[6]:
         A         B      C
A
baz 5  baz -2.301539   True
    2  baz -0.528172  False
bar 1  bar -0.611756   True
    4  bar  0.865408  False
foo 3  foo -1.072969   True
    0  foo  1.624345  False

Clean up the df index by using reset_index with drop=True:

In [7]: sort2.reset_index(0, drop=True)
Out[7]:
     A         B      C
5  baz -2.301539   True
2  baz -0.528172  False
1  bar -0.611756   True
4  bar  0.865408  False
3  foo -1.072969   True
0  foo  1.624345  False

这篇关于 pandas 按组聚合和列排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆