将缺失的日期添加到 Pandas 数据框 [英] Add missing dates to pandas dataframe
问题描述
我的数据可以在给定日期有多个事件,也可以在某个日期没有事件.我接受这些事件,按日期计算并绘制它们.但是,当我绘制它们时,我的两个系列并不总是匹配.
My data can have multiple events on a given date or NO events on a date. I take these events, get a count by date and plot them. However, when I plot them, my two series don't always match.
idx = pd.date_range(df['simpleDate'].min(), df['simpleDate'].max())
s = df.groupby(['simpleDate']).size()
在上面的代码中,idx 变成了 30 个日期的范围.09-01-2013 至 09-30-2013但是,S 可能只有 25 或 26 天,因为给定日期没有发生任何事件.然后我得到一个 AssertionError 因为当我尝试绘图时尺寸不匹配:
In the above code idx becomes a range of say 30 dates. 09-01-2013 to 09-30-2013 However S may only have 25 or 26 days because no events happened for a given date. I then get an AssertionError as the sizes dont match when I try to plot:
fig, ax = plt.subplots()
ax.bar(idx.to_pydatetime(), s, color='green')
解决这个问题的正确方法是什么?我想从 IDX 中删除没有值的日期还是(我宁愿这样做)将缺失的日期添加到系列中,计数为 0.我宁愿有一个完整的图表30 天,值为 0.如果这种方法是正确的,有关如何开始的任何建议?我需要某种动态 reindex
函数吗?
What's the proper way to tackle this? Do I want to remove dates with no values from IDX or (which I'd rather do) is add to the series the missing date with a count of 0. I'd rather have a full graph of 30 days with 0 values. If this approach is right, any suggestions on how to get started? Do I need some sort of dynamic reindex
function?
这是 S 的片段(df.groupby(['simpleDate']).size()
),注意没有 04 和 05 的条目.
Here's a snippet of S ( df.groupby(['simpleDate']).size()
), notice no entries for 04 and 05.
09-02-2013 2
09-03-2013 10
09-06-2013 5
09-07-2013 1
推荐答案
你可以使用 Series.reindex
:
import pandas as pd
idx = pd.date_range('09-01-2013', '09-30-2013')
s = pd.Series({'09-02-2013': 2,
'09-03-2013': 10,
'09-06-2013': 5,
'09-07-2013': 1})
s.index = pd.DatetimeIndex(s.index)
s = s.reindex(idx, fill_value=0)
print(s)
收益
2013-09-01 0
2013-09-02 2
2013-09-03 10
2013-09-04 0
2013-09-05 0
2013-09-06 5
2013-09-07 1
2013-09-08 0
...
这篇关于将缺失的日期添加到 Pandas 数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!