使用 Pandas 进行计数和排序 [英] Count and Sort with Pandas

查看：49 发布时间：2021/12/27 8:13:00 python sorting pandas count group-by

本文介绍了使用 Pandas 进行计数和排序的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个用于值的数据框形成一个文件，我通过该文件按两列分组，这些列返回聚合的计数.现在我想按最大计数值排序，但是出现以下错误:

<块引用>

键错误:'计数'

看起来 group by agg count 列是某种索引，所以不知道该怎么做，我是 Python 和 Panda 的初学者.这是实际代码，如果您需要更多详细信息，请告诉我:

def answer_five():df = census_df#.set_index(['STNAME'])df = df[df['SUMLEV'] == 50]df = df[['STNAME','CTYNAME']].groupby(['STNAME']).agg(['count']).sort(['count'])#df.set_index(['count'])打印(df.index)# 获取排序后的最大项目数返回 df.head(5)

解决方案

我认为你需要添加 reset_index，然后参数 ascending=False 到 sort_values 因为sort返回:

<块引用>FutureWarning: sort(columns=....) 已弃用，使用 sort_values(by=.....).sort_values(['count'], 升序=假)
df = df[['STNAME','CTYNAME']].groupby(['STNAME'])['CTYNAME'] .数数() .reset_index(name='count') .sort_values(['count'], 升序=假) .head(5)
示例:
df = pd.DataFrame({'STNAME':list('abscscbcdbcsscae'),'CTYNAME':[4,5,6,5,6,2,3,4,5,6,4,5,4,3,6,5]})打印 (df)CTYNAME STNAME0 4 一1 5 乙2 6 秒3 5 c4 6 秒5 2 c6 3 乙7 4 c8 5 天9 6 乙10 4 c11 5 秒12 4 秒13 3 c14 6 一个15 5 电子df = df[['STNAME','CTYNAME']].groupby(['STNAME'])['CTYNAME'] .数数() .reset_index(name='count') .sort_values(['count'], 升序=假) .head(5)打印 (df)STNAME 计数2 c 55 秒 41 到 30 一 23 天 1
<小时>
但似乎您需要系列.最大:
df = df[['STNAME','CTYNAME']].groupby(['STNAME'])['CTYNAME'].count().nlargest(5)
或:
df = df[['STNAME','CTYNAME']].groupby(['STNAME'])['CTYNAME'].size().nlargest(5)
<块引用>
size 和 count 的区别在于:
size 计数 NaN 值，count 没有.
示例:
df = pd.DataFrame({'STNAME':list('abscscbcdbcsscae'),'CTYNAME':[4,5,6,5,6,2,3,4,5,6,4,5,4,3,6,5]})打印 (df)CTYNAME STNAME0 4 一1 5 乙2 6 秒3 5 c4 6 秒5 2 c6 3 乙7 4 c8 5 天9 6 乙10 4 c11 5 秒12 4 秒13 3 c14 6 一个15 5 电子df = df[['STNAME','CTYNAME']].groupby(['STNAME'])['CTYNAME'].尺寸().nlargest(5).reset_index(name='top5')打印 (df)STNAME 前50 c 51 秒 42 b 33 一个 24 天 1
I have a dataframe for values form a file by which I have grouped by two columns, which return a count of the aggregation. Now I want to sort by the max count value, however I get the following error:

  KeyError: 'count'
Looks the group by agg count column is some sort of index so not sure how to do this, I'm a beginner to Python and Panda.
Here's the actual code, please let me know if you need more detail:
def answer_five():
    df = census_df#.set_index(['STNAME'])
    df = df[df['SUMLEV'] == 50]
    df = df[['STNAME','CTYNAME']].groupby(['STNAME']).agg(['count']).sort(['count'])
    #df.set_index(['count'])
    print(df.index)
    # get sorted count max item
    return df.head(5)

 解决方案 
I think you need add reset_index, then parameter ascending=False to sort_values because sort return:

  FutureWarning: sort(columns=....) is deprecated, use sort_values(by=.....)
    .sort_values(['count'], ascending=False)


df = df[['STNAME','CTYNAME']].groupby(['STNAME'])['CTYNAME'] 
                             .count() 
                             .reset_index(name='count') 
                             .sort_values(['count'], ascending=False) 
                             .head(5)
Sample:
df = pd.DataFrame({'STNAME':list('abscscbcdbcsscae'),
                   'CTYNAME':[4,5,6,5,6,2,3,4,5,6,4,5,4,3,6,5]})

print (df)
    CTYNAME STNAME
0         4      a
1         5      b
2         6      s
3         5      c
4         6      s
5         2      c
6         3      b
7         4      c
8         5      d
9         6      b
10        4      c
11        5      s
12        4      s
13        3      c
14        6      a
15        5      e

df = df[['STNAME','CTYNAME']].groupby(['STNAME'])['CTYNAME'] 
                             .count() 
                             .reset_index(name='count') 
                             .sort_values(['count'], ascending=False) 
                             .head(5)

print (df)
  STNAME  count
2      c      5
5      s      4
1      b      3
0      a      2
3      d      1




But it seems you need Series.nlargest:
df = df[['STNAME','CTYNAME']].groupby(['STNAME'])['CTYNAME'].count().nlargest(5)
or:
df = df[['STNAME','CTYNAME']].groupby(['STNAME'])['CTYNAME'].size().nlargest(5)



  The difference between size and count is:
  
  size counts NaN values, count does not.
Sample:
df = pd.DataFrame({'STNAME':list('abscscbcdbcsscae'),
                   'CTYNAME':[4,5,6,5,6,2,3,4,5,6,4,5,4,3,6,5]})

print (df)
    CTYNAME STNAME
0         4      a
1         5      b
2         6      s
3         5      c
4         6      s
5         2      c
6         3      b
7         4      c
8         5      d
9         6      b
10        4      c
11        5      s
12        4      s
13        3      c
14        6      a
15        5      e

df = df[['STNAME','CTYNAME']].groupby(['STNAME'])['CTYNAME']
                             .size()
                             .nlargest(5)
                             .reset_index(name='top5')
print (df)
  STNAME  top5
0      c     5
1      s     4
2      b     3
3      a     2
4      d     1


                        
这篇关于使用 Pandas 进行计数和排序的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

使用 Pandas 进行计数和排序 [英] Count and Sort with Pandas

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

使用 Pandas 进行计数和排序 [英] Count and Sort with Pandas

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭