根据 pandas 的日期范围计算定性值 [英] Counting qualitative values based on the date range in Pandas

查看:24
本文介绍了根据 pandas 的日期范围计算定性值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在学习使用熊猫库,需要进行分析并绘制下面的犯罪数据集.每行代表一次犯罪.date_rep列包含一年的每日日期.

I am learning to use Pandas library and need to perform analysis and plot the crime data set below. Each row represents one occurrence of crime. date_rep column contains daily dates for a year.

数据需要按月分组,并且每个月需要汇总特定犯罪实例,如下表所示.

Data needs to be grouped by month and instances of specific crime need to be added up per month, like in the table below.

我遇到的问题是,犯罪列中的数据是定性的,我只是无法在线找到可以帮助我解决此问题的资源!

The problem I am running into is that data in crime column is qualitative and I just cant find resources online that can help me solve this!

我一直在阅读groupby和不同的排序方法,但是最有效的方法是什么?预先谢谢你!

I have been reading up on groupby and different methods of sorting but what is the most efficient way of accomplishing this? Thank you in advance!

推荐答案

要复制某些数据:

In [29]: df = pd.DataFrame({'date_rep':pd.date_range('2012-01-01', periods=100),
    ...:                    'crm_cd_desc':np.random.choice(['robbery', 'traffic', 'assault'], size=100)})


In [30]: df.head()
Out[30]: 
  crm_cd_desc   date_rep
0     traffic 2012-01-01
1     traffic 2012-01-02
2     assault 2012-01-03
3     robbery 2012-01-04

本质上,您想要做的是值计数:

In essence, what you want to do is a value counts:

In [31]: df['crm_cd_desc'].value_counts()
Out[31]: 
assault    36
traffic    34
robbery    30
dtype: int64

但是,您希望每个月分别进行一次此操作.要按月分组,可以使用 groupby 中的 pd.Grouper 指定月份:

However, you want to do this for each month seperately. To group by month, you can use pd.Grouper inside groupby to specify the month:

In [34]: df.groupby(pd.Grouper(key='date_rep', freq='M'))['crm_cd_desc'].value_counts()
Out[34]: 
date_rep           
2012-01-31  traffic    12
            robbery    10
            assault     9
2012-02-29  assault    13
            traffic    11
            robbery     5
2012-03-31  assault    12
            robbery    10
            traffic     9
2012-04-30  robbery     5
            assault     2
            traffic     2
dtype: int64

然后 unstack 获得结果:

In [35]: df.groupby(pd.Grouper(key='date_rep', freq='M'))['crm_cd_desc'].value_counts().unstack()
Out[35]: 
            assault  robbery  traffic
date_rep                             
2012-01-31        9       10       12
2012-02-29       13        5       11
2012-03-31       12       10        9
2012-04-30        2        5        2

除了使用 value_counts ,您还可以按月份和犯罪类型进行分组,然后计算每组的长度:

Instead of using value_counts, you can also group by both the month and the crime type and then calculate the length of each group:

In [46]: df.groupby([pd.Grouper(key='date_rep', freq='M'), 'crm_cd_desc']).size().unstack()
Out[46]: 
crm_cd_desc  assault  robbery  traffic
date_rep                              
2012-01-31        9       10       12
2012-02-29       13        5       11
2012-03-31       12       10        9
2012-04-30        2        5        2

这篇关于根据 pandas 的日期范围计算定性值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆