在 pandas 中应用分组后获得最大计数的行值 [英] Get row value of maximum count after applying group by in pandas

查看:92
本文介绍了在 pandas 中应用分组后获得最大计数的行值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下df

>In [260]: df
>Out[260]:
    size market vegetable  confirm availability
0  Large    ABC    Tomato                   NaN
1  Large    XYZ    Tomato                   NaN
2  Small    ABC    Tomato                   NaN
3  Large    ABC     Onion                   NaN
4  Small    ABC     Onion                   NaN
5  Small    XYZ     Onion                   NaN
6  Small    XYZ     Onion                   NaN
7  Small    XYZ   Cabbage                   NaN
8  Large    XYZ   Cabbage                   NaN
9  Small    ABC   Cabbage                   NaN

1)如何获取最大尺寸的蔬菜的大小?

1) How to get the size of a vegetable whose size count is maximum?

我在蔬菜和大小上使用了groupby来获得以下df 但是我需要获取包含最大尺寸的行 蔬菜

I used groupby on vegetable and size to get the following df But I need to get the rows which contain the maximum count of size with vegetable

In [262]: df.groupby(['vegetable','size']).count()
Out[262]:                 market  confirm availability
vegetable size
Cabbage   Large       1                     0
          Small       2                     0
Onion     Large       1                     0
          Small       3                     0
Tomato    Large       2                     0
          Small       1                     0

df2['vegetable','size'] = df.groupby(['vegetable','size']).count().apply( some logic )

必需的Df:

  vegetable   size   max_count
0   Cabbage   Small     2
1     Onion   Small     3
2    Tomato   Large     2

2)现在我可以说df提供了大量的小白菜".因此,我需要在所有白菜行中填充确认可用性"列 该怎么做?

2) Now I can say 'Small Cabbages' are available in huge quantity from df. So I need to populate the confirm availability column with small for all cabbage rows How to do this?

    size market vegetable  confirm availability
0  Large    ABC    Tomato                   Large
1  Large    XYZ    Tomato                   Large
2  Small    ABC    Tomato                   Large
3  Large    ABC     Onion                   Small
4  Small    ABC     Onion                   Small
5  Small    XYZ     Onion                   Small
6  Small    XYZ     Onion                   Small
7  Small    XYZ   Cabbage                   Small    
8  Large    XYZ   Cabbage                   Small    
9  Small    ABC   Cabbage                   Small

推荐答案

1)

required_df = veg_df.groupby(['vegetable','size'], as_index=False)['market'].count()\
         .sort_values(by=['vegetable', 'market'])\
         .drop_duplicates(subset='vegetable', keep='last')

2)

merged_df = veg_df.merge(required_df, on='vegetable')
cols = ['size_x', 'market_x', 'vegetable', 'size_y']
dict_renaming_cols = {'size_x': 'size', 
                      'market_x': 'market',
                      'size_y': 'confirm_availability'}
merged_df = merged_df.loc[:,cols].rename(columns=dict_renaming_cols)

这篇关于在 pandas 中应用分组后获得最大计数的行值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆