填补 pandas 数据框中的日期空白 [英] Filling date gaps in pandas dataframe
问题描述
我有以日期时间为索引的Pandas DataFrame(从.csv加载).问题是我有差距,即有些日子我完全没有数据.在间隙中插入行(天)的最简单方法是什么?还有一种方法可以控制作为数据插入到列中的内容!说0或复制前一天的信息,或填充从上一个日期到下一个日期的数据值范围内的滑动递增/递减值.
I have Pandas DataFrame (loaded from .csv) with Date-time as index.. where there is/have-to-be one entry per day. The problem is that I have gaps i.e. there is days for which I have no data at all. What is the easiest way to insert rows (days) in the gaps ? Also is there a way to control what is inserted in the columns as data ! Say 0 OR copy the prev day info OR to fill sliding increasing/decreasing values in the range from prev-date toward next-date data-values.
谢谢
以下是示例01-03和01-04缺少的内容:
Here is example 01-03 and 01-04 are missing :
In [60]: df['2015-01-06':'2015-01-01']
Out[60]:
Rate High (est) Low (est)
Date
2015-01-06 1.19643 0.0000 0.0000
2015-01-05 1.20368 1.2186 1.1889
2015-01-02 1.21163 1.2254 1.1980
2015-01-01 1.21469 1.2282 1.2014
仍在尝试,但这似乎可以解决问题:
Still experimenting but this seems to solve the problem :
df.set_index(pd.DatetimeIndex(df.Date),inplace=True)
然后重新采样...原因是导入带有标头名称为Date的.csv并不是实际上创建date-time-index,而是冻结列表(无论其含义是什么).resample()预期:如果isinstance(ax,DatetimeIndex):.....
and then resample... the reason being that importing the .csv with header-col-name Date, is not actually creating date-time-index, but Frozen-list whatever that means. resample() is expecting : if isinstance(ax, DatetimeIndex): .....
这是我的最终解决方案:
Here is my final solution :
#make dates the index
self.df.set_index(pd.DatetimeIndex(self.df.Date), inplace=True)
#fill the gaps
self.df = self.df.resample('D',fill_method='pad')
#fix the Date column
self.df.Date = self.df.index.values
我必须修复Date列,因为resample()仅允许您填充它.不过,它可以正确地修复索引,因此我可以使用它来修复日期"列.
I had to fix the Date column, because resample() just allow you to pad-it. It fixes the index correctly though, so I could use it to fix the Date column.
此处是纠正后的数据摘要:
Here is snipped of the data after correction :
2015-01-29 2015-01-29 1.13262 0.0000 0.0000
2015-01-30 2015-01-30 1.13161 1.1450 1.1184
2015-01-31 2015-01-31 1.13161 1.1450 1.1184
2015-02-01 2015-02-01 1.13161 1.1450 1.1184
01-30、01-31是新生成的数据.
01-30, 01-31 are the new generated data.
推荐答案
您可以按天重新采样,例如如果每天有多个条目,则使用均值:
You'll could resample by day e.g. using mean if there are multiple entries per day:
df.resample('D', how='mean')
然后,您可以填充
以将NaN替换为前几天的结果.
You can then ffill
to replace NaNs with the previous days result.
请参见上采样和下采样在文档中.
See up and down sampling in the docs.
这篇关于填补 pandas 数据框中的日期空白的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!