如何创建几个月的期间范围并用零填充? [英] How do I create a period range of months and fill it with zeroes?
问题描述
假设我有一个数据框,其中包含每个月发生的某些事件.数据仅包含事件的月份和年份,每个月发生多少事件.
Suppose I have a dataframe containing certain events that happen in each month. The data only have months and years of the events and how many of that events happen every month.
df = pd.DataFrame({'month':['2018-01', '2018-02', '2018-04','2018-05','2018-06',
'2018-07', '2018-10','2018-11', '2019-01', '2019-02',
'2019-03', '2019-05','2019-07', '2019-11', '2019-12'],
'counts':[10,5,6,1,2,5,7,8,9,1,10,12,8,10,4]})
df
month counts
0 2018-01 10
1 2018-02 5
2 2018-04 6
3 2018-05 1
4 2018-06 2
5 2018-07 5
6 2018-10 7
7 2018-11 8
8 2019-01 9
9 2019-02 1
10 2019-03 10
11 2019-05 12
12 2019-07 10
13 2019-11 10
14 2019-12 4
如上所述,在2018年1月至2019年12月之间存在一个时间范围,但并非所有月份都具有计数值.例如,没有2018年3月(2018-03)的数据,并且它们之间有很多缺失的月份.
As you notice above, there is a time range between January 2018 to December 2019, but not all months have count values in them. For example, there is no data for March 2018 (2018-03), and there are many missing months in between them.
我想把缺少的月份放入零,所以基本上我想按正确的顺序插入 {'month':'2018-03',count:0}
.我也想对所有应该存在的缺失月份和价值做同样的事情.
I want to put this missing months and fill it with zero, so basically I want to insert {'month':'2018-03', count:0}
in the right order. I also want to do the same thing to all the missing months and values that are supposed to be there.
我的工作如下.
我将月份转换为适当的格式.
I converted the month to the appropriate format.
df['month'] = pd.to_datetime(df['month']).dt.to_period('M')
上面的代码工作正常.
然后我尝试以每月频率创建日期范围,但这不起作用.
Then I tried to create a date range in monthly frequency, but this does not work.
idx = pd.date_range(min(df['month']), max(df['month']), freq='M)
错误显示为 ValueError:无法明确将Period转换为Timestamp.使用to_timestamp
我该怎么办?谢谢.
推荐答案
使用 句点范围
,然后将期间"列转换为 PeriodIndex
并使用
Use period_range
, then convert periods column to PeriodIndex
and use DataFrame.reindex
:
df['month'] = pd.to_datetime(df['month']).dt.to_period('M')
idx = pd.period_range(df['month'].min(), df['month'].max(), freq='M')
df = df.set_index('month').reindex(idx, fill_value=0)
print (df)
counts
2018-01 10
2018-02 5
2018-03 0
2018-04 6
2018-05 1
2018-06 2
2018-07 5
2018-08 0
2018-09 0
2018-10 7
2018-11 8
2018-12 0
2019-01 9
2019-02 1
2019-03 10
2019-04 0
2019-05 12
2019-06 0
2019-07 8
2019-08 0
2019-09 0
2019-10 0
2019-11 10
2019-12 4
这篇关于如何创建几个月的期间范围并用零填充?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!