按因子分组并返回另一列的第二个最小值 [英] group by factor and return second lowest value of other column
问题描述
我想按zipcode
列中的值对该数据帧进行分组,然后在另一个(称为速率")列中返回second lowest
速率或lowest
速率或max
速率.>
例如,来自此df:
zipcode state county_code name rate_area_x plan_id metal_level rate rate_area_y
36749 AL 1001 Autauga 11 52161YL6358432 Silver 245.82 6
36749 AL 1001 Autauga 11 01100AO4222848 Silver 271.77 5
36749 AL 1001 Autauga 11 24848KC5063721 Silver 264.84 1
36749 AL 1001 Autauga 11 89885YK0256118 Silver 269.11 8
36749 AL 1001 Autauga 11 65392ON5819785 Silver 305.02 12
30165 AL 1019 Cherokee 13 52161YL6358432 Silver 245.82 6
30165 AL 1019 Cherokee 13 01100AO4222848 Silver 271.77 5
30165 AL 1019 Cherokee 13 24848KC5063721 Silver 264.84 1
30165 AL 1019 Cherokee 13 89885YK0256118 Silver 269.11 8
30165 AL 1019 Cherokee 13 65392ON5819785 Silver 305.02 12
30165 AL 1019 Cherokee 13 90884WN5801293 Silver 323.25 2
30165 AL 1019 Cherokee 13 79113BU1788705 Silver 344.81 7
我希望:
zipcode rate
36749 245.82
30165 245.82
在R中,我这样做是为了获取每个邮政编码组的最小值:
grouped_df <- df %>%
group_by(zipcode) %>%
summarise(rate = min(rate))
但是如何使用Python的Pandas获得第二低的费率值?
编辑:通常情况下,我会为您提供最小尺寸和第二个最小尺寸.但是,正如@WenYoBen在评论中提到的那样,您可能只希望第二低的位置.在这种情况下,您只需链接reset_index
,drop
和drop_duplicates
即可获得最小或第二个最小,如下所示:
变得最小:
df.groupby('zipcode').rate.nsmallest(2).reset_index().drop('level_1',1) \
.drop_duplicates(subset=['zipcode'])
Out[2108]:
zipcode rate
0 30165 245.82
2 36749 245.82
获得最小的第二个:
df.groupby('zipcode').rate.nsmallest(2).reset_index().drop('level_1',1) \
.drop_duplicates(subset=['zipcode'], keep='last')
Out[2109]:
zipcode rate
1 30165 264.84
3 36749 264.84
原始:
groupby.nsmallest
将为您提供每组最小和第二小的
df.groupby('zipcode').rate.nsmallest(2)
Out[2083]:
zipcode
30165 5 245.82
7 264.84
36749 0 245.82
2 264.84
Name: rate, dtype: float64
I'd like to group this data frame by the values in zipcode
column, and return in another (called rate) column the second lowest
rate or the lowest
rate or the max
rate.
For example, from this df:
zipcode state county_code name rate_area_x plan_id metal_level rate rate_area_y
36749 AL 1001 Autauga 11 52161YL6358432 Silver 245.82 6
36749 AL 1001 Autauga 11 01100AO4222848 Silver 271.77 5
36749 AL 1001 Autauga 11 24848KC5063721 Silver 264.84 1
36749 AL 1001 Autauga 11 89885YK0256118 Silver 269.11 8
36749 AL 1001 Autauga 11 65392ON5819785 Silver 305.02 12
30165 AL 1019 Cherokee 13 52161YL6358432 Silver 245.82 6
30165 AL 1019 Cherokee 13 01100AO4222848 Silver 271.77 5
30165 AL 1019 Cherokee 13 24848KC5063721 Silver 264.84 1
30165 AL 1019 Cherokee 13 89885YK0256118 Silver 269.11 8
30165 AL 1019 Cherokee 13 65392ON5819785 Silver 305.02 12
30165 AL 1019 Cherokee 13 90884WN5801293 Silver 323.25 2
30165 AL 1019 Cherokee 13 79113BU1788705 Silver 344.81 7
I'd expect:
zipcode rate
36749 245.82
30165 245.82
In R I'd do this to get the min value for each zipcode group:
grouped_df <- df %>%
group_by(zipcode) %>%
summarise(rate = min(rate))
But how to get the second lowest rate value using Python's Pandas?
Edit: I give you both smallest and 2nd smallest for you using in general case. However, as @WenYoBen mentioned in the comment you probably only want the 2nd lowest. If that is the case, you just need to chain reset_index
, drop
, and drop_duplicates
to get smallest or 2nd smallest as follows:
Get smallest:
df.groupby('zipcode').rate.nsmallest(2).reset_index().drop('level_1',1) \
.drop_duplicates(subset=['zipcode'])
Out[2108]:
zipcode rate
0 30165 245.82
2 36749 245.82
Get 2nd smallest:
df.groupby('zipcode').rate.nsmallest(2).reset_index().drop('level_1',1) \
.drop_duplicates(subset=['zipcode'], keep='last')
Out[2109]:
zipcode rate
1 30165 264.84
3 36749 264.84
Original:
groupby.nsmallest
will give you smallest and 2nd smallest of each group
df.groupby('zipcode').rate.nsmallest(2)
Out[2083]:
zipcode
30165 5 245.82
7 264.84
36749 0 245.82
2 264.84
Name: rate, dtype: float64
这篇关于按因子分组并返回另一列的第二个最小值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!