填补 pandas 数据框中的日期空白 [英] Filling date gaps in pandas dataframe

查看:40
本文介绍了填补 pandas 数据框中的日期空白的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以日期时间为索引的Pandas DataFrame(从.csv加载).问题是我有差距,即有些日子我完全没有数据.在间隙中插入行(天)的最简单方法是什么?还有一种方法可以控制作为数据插入到列中的内容!说0或复制前一天的信息,或填充从上一个日期到下一个日期的数据值范围内的滑动递增/递减值.

I have Pandas DataFrame (loaded from .csv) with Date-time as index.. where there is/have-to-be one entry per day. The problem is that I have gaps i.e. there is days for which I have no data at all. What is the easiest way to insert rows (days) in the gaps ? Also is there a way to control what is inserted in the columns as data ! Say 0 OR copy the prev day info OR to fill sliding increasing/decreasing values in the range from prev-date toward next-date data-values.

谢谢

以下是示例01-03和01-04缺少的内容:

Here is example 01-03 and 01-04 are missing :

In [60]: df['2015-01-06':'2015-01-01']
Out[60]: 
           Rate  High (est)  Low (est)
Date                                      
2015-01-06  1.19643      0.0000     0.0000
2015-01-05  1.20368      1.2186     1.1889
2015-01-02  1.21163      1.2254     1.1980
2015-01-01  1.21469      1.2282     1.2014


仍在尝试,但这似乎可以解决问题:


Still experimenting but this seems to solve the problem :

df.set_index(pd.DatetimeIndex(df.Date),inplace=True)

然后重新采样...原因是导入带有标头名称为Date的.csv并不是实际上创建date-time-index,而是冻结列表(无论其含义是什么).resample()预期:如果isinstance(ax,DatetimeIndex):.....

and then resample... the reason being that importing the .csv with header-col-name Date, is not actually creating date-time-index, but Frozen-list whatever that means. resample() is expecting : if isinstance(ax, DatetimeIndex): .....

这是我的最终解决方案:

Here is my final solution :

  #make dates the index
  self.df.set_index(pd.DatetimeIndex(self.df.Date), inplace=True)
  #fill the gaps
  self.df = self.df.resample('D',fill_method='pad')
  #fix the Date column
  self.df.Date = self.df.index.values

我必须修复Date列,因为resample()仅允许您填充它.不过,它可以正确地修复索引,因此我可以使用它来修复日期"列.

I had to fix the Date column, because resample() just allow you to pad-it. It fixes the index correctly though, so I could use it to fix the Date column.

此处是纠正后的数据摘要:

Here is snipped of the data after correction :

2015-01-29 2015-01-29  1.13262      0.0000     0.0000
2015-01-30 2015-01-30  1.13161      1.1450     1.1184
2015-01-31 2015-01-31  1.13161      1.1450     1.1184
2015-02-01 2015-02-01  1.13161      1.1450     1.1184

01-30、01-31是新生成的数据.

01-30, 01-31 are the new generated data.

推荐答案

您可以按天重新采样,例如如果每天有多个条目,则使用均值:

You'll could resample by day e.g. using mean if there are multiple entries per day:

df.resample('D', how='mean')

然后,您可以填充以将NaN替换为前几天的结果.

You can then ffill to replace NaNs with the previous days result.

请参见上采样和下采样在文档中.

See up and down sampling in the docs.

这篇关于填补 pandas 数据框中的日期空白的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆