在 pandas 中应用分组后获得最大计数的行值 [英] Get row value of maximum count after applying group by in pandas
本文介绍了在 pandas 中应用分组后获得最大计数的行值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有以下df
>In [260]: df
>Out[260]:
size market vegetable confirm availability
0 Large ABC Tomato NaN
1 Large XYZ Tomato NaN
2 Small ABC Tomato NaN
3 Large ABC Onion NaN
4 Small ABC Onion NaN
5 Small XYZ Onion NaN
6 Small XYZ Onion NaN
7 Small XYZ Cabbage NaN
8 Large XYZ Cabbage NaN
9 Small ABC Cabbage NaN
1)如何获取最大尺寸的蔬菜的大小?
1) How to get the size of a vegetable whose size count is maximum?
我在蔬菜和大小上使用了groupby来获得以下df 但是我需要获取包含最大尺寸的行 蔬菜
I used groupby on vegetable and size to get the following df But I need to get the rows which contain the maximum count of size with vegetable
In [262]: df.groupby(['vegetable','size']).count()
Out[262]: market confirm availability
vegetable size
Cabbage Large 1 0
Small 2 0
Onion Large 1 0
Small 3 0
Tomato Large 2 0
Small 1 0
df2['vegetable','size'] = df.groupby(['vegetable','size']).count().apply( some logic )
必需的Df:
vegetable size max_count
0 Cabbage Small 2
1 Onion Small 3
2 Tomato Large 2
2)现在我可以说df提供了大量的小白菜".因此,我需要在所有白菜行中填充确认可用性"列 该怎么做?
2) Now I can say 'Small Cabbages' are available in huge quantity from df. So I need to populate the confirm availability column with small for all cabbage rows How to do this?
size market vegetable confirm availability
0 Large ABC Tomato Large
1 Large XYZ Tomato Large
2 Small ABC Tomato Large
3 Large ABC Onion Small
4 Small ABC Onion Small
5 Small XYZ Onion Small
6 Small XYZ Onion Small
7 Small XYZ Cabbage Small
8 Large XYZ Cabbage Small
9 Small ABC Cabbage Small
推荐答案
1)
required_df = veg_df.groupby(['vegetable','size'], as_index=False)['market'].count()\
.sort_values(by=['vegetable', 'market'])\
.drop_duplicates(subset='vegetable', keep='last')
2)
merged_df = veg_df.merge(required_df, on='vegetable')
cols = ['size_x', 'market_x', 'vegetable', 'size_y']
dict_renaming_cols = {'size_x': 'size',
'market_x': 'market',
'size_y': 'confirm_availability'}
merged_df = merged_df.loc[:,cols].rename(columns=dict_renaming_cols)
这篇关于在 pandas 中应用分组后获得最大计数的行值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文