分组并找到前n个value_counts个 pandas [英] Group by and find top n value_counts pandas

查看:302
本文介绍了分组并找到前n个value_counts个 pandas 的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个出租车数据的数据框,其中有两列,如下所示:

I have a dataframe of taxi data with two columns that looks like this:

Neighborhood    Borough        Time
Midtown         Manhattan      X
Melrose         Bronx          Y
Grant City      Staten Island  Z
Midtown         Manhattan      A
Lincoln Square  Manhattan      B

基本上,每一行代表该市镇附近的一辆出租车.现在,我想找到每个行政区中接送次数最多的前5个街区.我试过了:

Basically, each row represents a taxi pickup in that neighborhood in that borough. Now, I want to find the top 5 neighborhoods in each borough with the most number of pickups. I tried this:

df['Neighborhood'].groupby(df['Borough']).value_counts()

哪个给了我这样的东西:

Which gives me something like this:

borough                          
Bronx          High  Bridge          3424
               Mott Haven            2515
               Concourse Village     1443
               Port Morris           1153
               Melrose                492
               North Riverdale        463
               Eastchester            434
               Concourse              395
               Fordham                252
               Wakefield              214
               Kingsbridge            212
               Mount Hope             200
               Parkchester            191
......

Staten Island  Castleton Corners        4
               Dongan Hills             4
               Eltingville              4
               Graniteville             4
               Great Kills              4
               Castleton                3
               Woodrow                  1

我该如何过滤它,以便仅从每一个中获得前5名?我知道有几个标题相似的问题,但它们对我的情况没有帮助.

How do I filter it so that I get only the top 5 from each? I know there are a few questions with a similar title but they weren't helpful to my case.

推荐答案

我认为您可以使用

I think you can use nlargest - you can change 1 to 5:

s = df['Neighborhood'].groupby(df['Borough']).value_counts()
print s
Borough                      
Bronx          Melrose            7
Manhattan      Midtown           12
               Lincoln Square     2
Staten Island  Grant City        11
dtype: int64

print s.groupby(level=[0,1]).nlargest(1)
Bronx          Bronx          Melrose        7
Manhattan      Manhattan      Midtown       12
Staten Island  Staten Island  Grant City    11
dtype: int64

正在创建其他列,指定级别信息

additional columns were getting created, specified level info

这篇关于分组并找到前n个value_counts个 pandas 的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆