Pandas groupby(),agg() - 如何在没有多索引的情况下返回结果? [英] Pandas groupby(),agg() - how to return results without the multi index?
问题描述
我有一个数据框:
pe_odds[ [ 'EVENT_ID', 'SELECTION_ID', 'ODDS' ] ]出[67]:EVENT_ID SELECTION_ID ODDS0 100429300 5297529 18.001 100429300 5297529 20.002 100429300 5297529 21.003 100429300 5297529 22.004 100429300 5297529 23.005 100429300 5297529 24.006 100429300 5297529 25.00
当我使用 groupby 和 agg 时,我得到了多索引的结果:
pe_odds.groupby( [ 'EVENT_ID', 'SELECTION_ID' ] )[ 'ODDS' ].agg( [ np.min, np.max ] )出[68]:阿明最大EVENT_ID SELECTION_ID100428417 5490293 1.71 1.715881623 1.14 1.355922296 2.00 2.005956692 2.00 2.02100428419 603721 2.44 2.904387436 4.30 6.204398859 1.23 1.354574687 1.35 1.464881396 14.50 19.006032606 2.94 4.206065580 2.70 5.806065582 2.42 3.65100428421 5911426 2.22 2.52
我尝试使用 as_index 返回结果而不使用 multi_index:
pe_odds.groupby( [ 'EVENT_ID', 'SELECTION_ID' ], as_index=False )[ 'ODDS' ].agg( [ np.min, np.max ], as_index=False )
但它仍然给了我一个多索引.
我可以使用 .reset_index(),但它很慢:
pe_odds.groupby( [ 'EVENT_ID', 'SELECTION_ID' ] )[ 'ODDS' ].agg( [ np.min, np.max ] ).reset_index()pe_odds.groupby( [ 'EVENT_ID', 'SELECTION_ID' ] )[ 'ODDS' ].agg( [ np.min, np.max ] ).reset_index()出[69]:EVENT_ID SELECTION_ID amin amax0 100428417 5490293 1.71 1.711 100428417 5881623 1.14 1.352 100428417 5922296 2.00 2.003 100428417 5956692 2.00 2.024 100428419 603721 2.44 2.905 100428419 4387436 4.30 6.20
如何使用 groupby 和/或 agg 函数的参数在没有多索引的情况下返回结果.并且不必求助于使用 reset_index() ?
下方调用:
<预><代码>>>>gr = df.groupby(['EVENT_ID', 'SELECTION_ID'], as_index=False)>>>res = gr.agg({'ODDS':[np.min, np.max]})>>>资源EVENT_ID SELECTION_ID ODDS阿明最大0 100429300 5297529 18 251 100429300 5297559 30 38返回一个带有多索引列的框架.如果您不希望列成为多索引,您可以这样做:
<预><代码>>>>res.columns = list(map(''.join, res.columns.values))>>>资源EVENT_ID SELECTION_ID ODDSamin ODDSamax0 100429300 5297529 18 251 100429300 5297559 30 38I have a dataframe:
pe_odds[ [ 'EVENT_ID', 'SELECTION_ID', 'ODDS' ] ]
Out[67]:
EVENT_ID SELECTION_ID ODDS
0 100429300 5297529 18.00
1 100429300 5297529 20.00
2 100429300 5297529 21.00
3 100429300 5297529 22.00
4 100429300 5297529 23.00
5 100429300 5297529 24.00
6 100429300 5297529 25.00
When I use groupby and agg, I get results with a multi-index:
pe_odds.groupby( [ 'EVENT_ID', 'SELECTION_ID' ] )[ 'ODDS' ].agg( [ np.min, np.max ] )
Out[68]:
amin amax
EVENT_ID SELECTION_ID
100428417 5490293 1.71 1.71
5881623 1.14 1.35
5922296 2.00 2.00
5956692 2.00 2.02
100428419 603721 2.44 2.90
4387436 4.30 6.20
4398859 1.23 1.35
4574687 1.35 1.46
4881396 14.50 19.00
6032606 2.94 4.20
6065580 2.70 5.80
6065582 2.42 3.65
100428421 5911426 2.22 2.52
I have tried using as_index to return the results without the multi_index:
pe_odds.groupby( [ 'EVENT_ID', 'SELECTION_ID' ], as_index=False )[ 'ODDS' ].agg( [ np.min, np.max ], as_index=False )
But it still gives me a multi-index.
I can use .reset_index(), but it is very slow:
pe_odds.groupby( [ 'EVENT_ID', 'SELECTION_ID' ] )[ 'ODDS' ].agg( [ np.min, np.max ] ).reset_index()
pe_odds.groupby( [ 'EVENT_ID', 'SELECTION_ID' ] )[ 'ODDS' ].agg( [ np.min, np.max ] ).reset_index()
Out[69]:
EVENT_ID SELECTION_ID amin amax
0 100428417 5490293 1.71 1.71
1 100428417 5881623 1.14 1.35
2 100428417 5922296 2.00 2.00
3 100428417 5956692 2.00 2.02
4 100428419 603721 2.44 2.90
5 100428419 4387436 4.30 6.20
How can I return the results, without the Multi-index, using parameters of the groupby and/or agg function. And without having to resort to using reset_index() ?
Below call:
>>> gr = df.groupby(['EVENT_ID', 'SELECTION_ID'], as_index=False)
>>> res = gr.agg({'ODDS':[np.min, np.max]})
>>> res
EVENT_ID SELECTION_ID ODDS
amin amax
0 100429300 5297529 18 25
1 100429300 5297559 30 38
returns a frame with mulit-index columns. If you do not want columns to be multi-index either you may do:
>>> res.columns = list(map(''.join, res.columns.values))
>>> res
EVENT_ID SELECTION_ID ODDSamin ODDSamax
0 100429300 5297529 18 25
1 100429300 5297559 30 38
这篇关于Pandas groupby(),agg() - 如何在没有多索引的情况下返回结果?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!