Pandas groupby(),agg() - 如何在没有多索引的情况下返回结果? [英] Pandas groupby(),agg() - how to return results without the multi index?

查看:44
本文介绍了Pandas groupby(),agg() - 如何在没有多索引的情况下返回结果?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框:

pe_odds[ [ 'EVENT_ID', 'SELECTION_ID', 'ODDS' ] ]出[67]:EVENT_ID SELECTION_ID ODDS0 100429300 5297529 18.001 100429300 5297529 20.002 100429300 5297529 21.003 100429300 5297529 22.004 100429300 5297529 23.005 100429300 5297529 24.006 100429300 5297529 25.00

当我使用 groupby 和 agg 时,我得到了多索引的结果:

pe_odds.groupby( [ 'EVENT_ID', 'SELECTION_ID' ] )[ 'ODDS' ].agg( [ np.min, np.max ] )出[68]:阿明最大EVENT_ID SELECTION_ID100428417 5490293 1.71 1.715881623 1.14 1.355922296 2.00 2.005956692 2.00 2.02100428419 603721 2.44 2.904387436 4.30 6.204398859 1.23 1.354574687 1.35 1.464881396 14.50 19.006032606 2.94 4.206065580 2.70 5.806065582 2.42 3.65100428421 5911426 2.22 2.52

我尝试使用 as_index 返回结果而不使用 multi_index:

pe_odds.groupby( [ 'EVENT_ID', 'SELECTION_ID' ], as_index=False )[ 'ODDS' ].agg( [ np.min, np.max ], as_index=False )

但它仍然给了我一个多索引.

我可以使用 .reset_index(),但它很慢:

pe_odds.groupby( [ 'EVENT_ID', 'SELECTION_ID' ] )[ 'ODDS' ].agg( [ np.min, np.max ] ).reset_index()pe_odds.groupby( [ 'EVENT_ID', 'SELECTION_ID' ] )[ 'ODDS' ].agg( [ np.min, np.max ] ).reset_index()出[69]:EVENT_ID SELECTION_ID amin amax0 100428417 5490293 1.71 1.711 100428417 5881623 1.14 1.352 100428417 5922296 2.00 2.003 100428417 5956692 2.00 2.024 100428419 603721 2.44 2.905 100428419 4387436 4.30 6.20

如何使用 groupby 和/或 agg 函数的参数在没有多索引的情况下返回结果.并且不必求助于使用 reset_index() ?

解决方案

下方调用:

<预><代码>>>>gr = df.groupby(['EVENT_ID', 'SELECTION_ID'], as_index=False)>>>res = gr.agg({'ODDS':[np.min, np.max]})>>>资源EVENT_ID SELECTION_ID ODDS阿明最大0 100429300 5297529 18 251 100429300 5297559 30 38

返回一个带有多索引的框架.如果您不希望列成为多索引,您可以这样做:

<预><代码>>>>res.columns = list(map(''.join, res.columns.values))>>>资源EVENT_ID SELECTION_ID ODDSamin ODDSamax0 100429300 5297529 18 251 100429300 5297559 30 38

I have a dataframe:

pe_odds[ [ 'EVENT_ID', 'SELECTION_ID', 'ODDS' ] ]
Out[67]: 
     EVENT_ID  SELECTION_ID   ODDS
0   100429300       5297529  18.00
1   100429300       5297529  20.00
2   100429300       5297529  21.00
3   100429300       5297529  22.00
4   100429300       5297529  23.00
5   100429300       5297529  24.00
6   100429300       5297529  25.00

When I use groupby and agg, I get results with a multi-index:

pe_odds.groupby( [ 'EVENT_ID', 'SELECTION_ID' ] )[ 'ODDS' ].agg( [ np.min, np.max ] )
Out[68]: 
                         amin   amax
EVENT_ID  SELECTION_ID              
100428417 5490293        1.71   1.71
          5881623        1.14   1.35
          5922296        2.00   2.00
          5956692        2.00   2.02
100428419 603721         2.44   2.90
          4387436        4.30   6.20
          4398859        1.23   1.35
          4574687        1.35   1.46
          4881396       14.50  19.00
          6032606        2.94   4.20
          6065580        2.70   5.80
          6065582        2.42   3.65
100428421 5911426        2.22   2.52

I have tried using as_index to return the results without the multi_index:

pe_odds.groupby( [ 'EVENT_ID', 'SELECTION_ID' ], as_index=False )[ 'ODDS' ].agg( [ np.min, np.max ], as_index=False )

But it still gives me a multi-index.

I can use .reset_index(), but it is very slow:

pe_odds.groupby( [ 'EVENT_ID', 'SELECTION_ID' ] )[ 'ODDS' ].agg( [ np.min, np.max ] ).reset_index()

pe_odds.groupby( [ 'EVENT_ID', 'SELECTION_ID' ] )[ 'ODDS' ].agg( [ np.min, np.max ] ).reset_index()
Out[69]: 
     EVENT_ID  SELECTION_ID   amin   amax
0   100428417       5490293   1.71   1.71
1   100428417       5881623   1.14   1.35
2   100428417       5922296   2.00   2.00
3   100428417       5956692   2.00   2.02
4   100428419        603721   2.44   2.90
5   100428419       4387436   4.30   6.20

How can I return the results, without the Multi-index, using parameters of the groupby and/or agg function. And without having to resort to using reset_index() ?

解决方案

Below call:

>>> gr = df.groupby(['EVENT_ID', 'SELECTION_ID'], as_index=False)
>>> res = gr.agg({'ODDS':[np.min, np.max]})
>>> res
    EVENT_ID SELECTION_ID ODDS     
                          amin amax
0  100429300      5297529   18   25
1  100429300      5297559   30   38

returns a frame with mulit-index columns. If you do not want columns to be multi-index either you may do:

>>> res.columns = list(map(''.join, res.columns.values))
>>> res
    EVENT_ID  SELECTION_ID  ODDSamin  ODDSamax
0  100429300       5297529        18        25
1  100429300       5297559        30        38

这篇关于Pandas groupby(),agg() - 如何在没有多索引的情况下返回结果?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆