按因子分组并返回另一列的第二个最小值 [英] group by factor and return second lowest value of other column

查看:62
本文介绍了按因子分组并返回另一列的第二个最小值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想按zipcode列中的值对该数据帧进行分组,然后在另一个(称为速率")列中返回second lowest速率或lowest速率或max速率.

例如,来自此df:

zipcode state   county_code name    rate_area_x plan_id metal_level rate    rate_area_y
36749   AL  1001    Autauga 11  52161YL6358432  Silver  245.82  6
36749   AL  1001    Autauga 11  01100AO4222848  Silver  271.77  5
36749   AL  1001    Autauga 11  24848KC5063721  Silver  264.84  1
36749   AL  1001    Autauga 11  89885YK0256118  Silver  269.11  8
36749   AL  1001    Autauga 11  65392ON5819785  Silver  305.02  12
30165   AL  1019    Cherokee    13  52161YL6358432  Silver  245.82  6
30165   AL  1019    Cherokee    13  01100AO4222848  Silver  271.77  5
30165   AL  1019    Cherokee    13  24848KC5063721  Silver  264.84  1
30165   AL  1019    Cherokee    13  89885YK0256118  Silver  269.11  8
30165   AL  1019    Cherokee    13  65392ON5819785  Silver  305.02  12
30165   AL  1019    Cherokee    13  90884WN5801293  Silver  323.25  2
30165   AL  1019    Cherokee    13  79113BU1788705  Silver  344.81  7

我希望:

zipcode rate
36749   245.82
30165   245.82

在R中,我这样做是为了获取每个邮政编码组的最小值:

grouped_df <- df %>%
              group_by(zipcode) %>%
              summarise(rate = min(rate))

但是如何使用Python的Pandas获得第二低的费率值?

解决方案

编辑:通常情况下,我会为您提供最小尺寸和第二个最小尺寸.但是,正如@WenYoBen在评论中提到的那样,您可能只希望第二低的位置.在这种情况下,您只需链接reset_indexdropdrop_duplicates即可获得最小或第二个最小,如下所示:

变得最小:

df.groupby('zipcode').rate.nsmallest(2).reset_index().drop('level_1',1) \
  .drop_duplicates(subset=['zipcode'])

Out[2108]:
       zipcode    rate
    0    30165  245.82
    2    36749  245.82


获得最小的第二个:

df.groupby('zipcode').rate.nsmallest(2).reset_index().drop('level_1',1) \
  .drop_duplicates(subset=['zipcode'], keep='last')

Out[2109]:
   zipcode    rate
1    30165  264.84
3    36749  264.84    


原始:

groupby.nsmallest将为您提供每组最小和第二小的

df.groupby('zipcode').rate.nsmallest(2)

Out[2083]:
zipcode
30165    5    245.82
         7    264.84
36749    0    245.82
         2    264.84
Name: rate, dtype: float64

I'd like to group this data frame by the values in zipcode column, and return in another (called rate) column the second lowest rate or the lowest rate or the max rate.

For example, from this df:

zipcode state   county_code name    rate_area_x plan_id metal_level rate    rate_area_y
36749   AL  1001    Autauga 11  52161YL6358432  Silver  245.82  6
36749   AL  1001    Autauga 11  01100AO4222848  Silver  271.77  5
36749   AL  1001    Autauga 11  24848KC5063721  Silver  264.84  1
36749   AL  1001    Autauga 11  89885YK0256118  Silver  269.11  8
36749   AL  1001    Autauga 11  65392ON5819785  Silver  305.02  12
30165   AL  1019    Cherokee    13  52161YL6358432  Silver  245.82  6
30165   AL  1019    Cherokee    13  01100AO4222848  Silver  271.77  5
30165   AL  1019    Cherokee    13  24848KC5063721  Silver  264.84  1
30165   AL  1019    Cherokee    13  89885YK0256118  Silver  269.11  8
30165   AL  1019    Cherokee    13  65392ON5819785  Silver  305.02  12
30165   AL  1019    Cherokee    13  90884WN5801293  Silver  323.25  2
30165   AL  1019    Cherokee    13  79113BU1788705  Silver  344.81  7

I'd expect:

zipcode rate
36749   245.82
30165   245.82

In R I'd do this to get the min value for each zipcode group:

grouped_df <- df %>%
              group_by(zipcode) %>%
              summarise(rate = min(rate))

But how to get the second lowest rate value using Python's Pandas?

解决方案

Edit: I give you both smallest and 2nd smallest for you using in general case. However, as @WenYoBen mentioned in the comment you probably only want the 2nd lowest. If that is the case, you just need to chain reset_index, drop, and drop_duplicates to get smallest or 2nd smallest as follows:

Get smallest:

df.groupby('zipcode').rate.nsmallest(2).reset_index().drop('level_1',1) \
  .drop_duplicates(subset=['zipcode'])

Out[2108]:
       zipcode    rate
    0    30165  245.82
    2    36749  245.82


Get 2nd smallest:

df.groupby('zipcode').rate.nsmallest(2).reset_index().drop('level_1',1) \
  .drop_duplicates(subset=['zipcode'], keep='last')

Out[2109]:
   zipcode    rate
1    30165  264.84
3    36749  264.84    


Original:

groupby.nsmallest will give you smallest and 2nd smallest of each group

df.groupby('zipcode').rate.nsmallest(2)

Out[2083]:
zipcode
30165    5    245.82
         7    264.84
36749    0    245.82
         2    264.84
Name: rate, dtype: float64

这篇关于按因子分组并返回另一列的第二个最小值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆