如何使用 pandas 按月和年对行进行分组和计数? [英] How to group and count rows by month and year using Pandas?
问题描述
我有一个包含个人数据的数据集,例如姓名,身高,体重和出生日期.我将用特定月份和年份中出生的人数来构建图表.我正在使用python熊猫来完成此操作,我的策略是尝试按年份和月份分组并使用计数进行添加.但是我得到的最接近的是按年或按月,而不是按两者来统计人数.
I have a dataset with personal data such as name, height, weight and date of birth. I would build a graph with the number of people born in a particular month and year. I'm using python pandas to accomplish this and my strategy was to try to group by year and month and add using count. But the closest I got is to get the count of people by year or by month but not by both.
df['birthdate'].groupby(df.birthdate.dt.year).agg('count')
stackoverflow中的其他问题指向一个名为TimeGrouper的石斑鱼,但在熊猫文档中搜索没有发现任何问题.有什么主意吗?
Other questions in stackoverflow point to a Grouper called TimeGrouper but searching in pandas documentation found nothing. Any idea?
推荐答案
要对多个条件进行分组,请传递列或条件的列表:
To group on multiple criteria, pass a list of the columns or criteria:
df['birthdate'].groupby([df.birthdate.dt.year, df.birthdate.dt.month]).agg('count')
示例:
In [165]:
df = pd.DataFrame({'birthdate':pd.date_range(start=dt.datetime(2015,12,20),end=dt.datetime(2016,3,1))})
df.groupby([df['birthdate'].dt.year, df['birthdate'].dt.month]).agg({'count'})
Out[165]:
birthdate
count
birthdate birthdate
2015 12 12
2016 1 31
2 29
3 1
更新
从版本 0.23.0
开始,由于多重索引级别名称必须唯一的限制,现在您需要rename
级别才能起作用:
As of version 0.23.0
the above code no longer works due to the restriction that multi-index level names must be unique, you now need to rename
the levels in order for this to work:
In[107]:
df.groupby([df['birthdate'].dt.year.rename('year'), df['birthdate'].dt.month.rename('month')]).agg({'count'})
Out[107]:
birthdate
count
year month
2015 12 12
2016 1 31
2 29
3 1
这篇关于如何使用 pandas 按月和年对行进行分组和计数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!