查找具有最大行数的索引 [英] Finding the Index with maximum number of rows

查看:42
本文介绍了查找具有最大行数的索引的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的任务:

对于下一组问题,我们将使用来自美国人口普查局的人口普查数据.县是美国各州的政治和地理分区.该数据集包含2010年至2015年美国各县和州的人口数据. census_df = pd.read_csv('census.csv')census_df = census_df [census_df ['SUMLEV'] == 50]census_df_2 = census_df.groupby(by ='STNAME',axis = 0)

但是,这不会'STNAME'对数据框进行分组,这在执行 census_df_2.head() <时可以看到/p>

我想这应该适用于分组的DataFrame:

  def answer_five():返回census_df_2 [census_df_2 ['COUNTY'].count()== max(census_df_2 ['COUNTY'].count())] .index().tolist()[0]answer_five() 

为什么groupby函数不起作用?我尝试更改轴并改用 set_index()函数,但无法正常工作.

如果有人知道解决此问题的另一种方法,我将不胜感激.

groupby 仅返回groupby对象,您必须指定要在该对象上使用的聚合函数,例如

  df.groupby(by ='STNAME').aggregate({'COUNTY':'nunique'}).idxmax()[0] 

给予

 'Texas' 

有关

的熊猫文档,请参见此处的熊猫文档.分组/汇总介绍.

My task:

For the next set of questions, we will be using census data from the United States Census Bureau. Counties are political and geographic subdivisions of states in the United States. This dataset contains population data for counties and states in the US from 2010 to 2015. See this document for a description of the variable names.

The census dataset (census.csv) should be loaded as census_df. Answer questions using this as appropriate.

Question 5

Which state has the most counties in it? (hint: consider the sumlevel key carefully! You'll need this for future questions too...)

This function should return a single string value.

census_df = pd.read_csv('census.csv')
census_df = census_df[census_df['SUMLEV']==50]
census_df_2 = census_df.groupby(by='STNAME',axis=0)

This, however, does not group the DataFrame by 'STNAME', which can be seen when executing census_df_2.head()

I suppose this should work on a grouped DataFrame:

def answer_five():
    return census_df_2[ census_df_2['COUNTY'].count() == max( census_df_2['COUNTY'].count() ) ].index().tolist()[0]
answer_five()

Why does the groupby function not work? I've tried changing the axis and using the set_index() function instead but I can't get it to work.

If someone knows another way to solve this problem I'd appreciate it.

解决方案

groupby simply returns a groupby object, you'll have to specify an aggregate function to be used on this object, e.g.

df.groupby(by='STNAME').aggregate({'COUNTY': 'nunique'}).idxmax()[0]

gives

'Texas'

See the pandas docs here for an introduction to grouping/aggregating.

这篇关于查找具有最大行数的索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆