查找以 1 分钟间隔采样的 Pandas 时间序列数据帧中的空白,并用新行填充空白 [英] Find gaps in pandas time series dataframe sampled at 1 minute intervals and fill the gaps with new rows

查看:66
本文介绍了查找以 1 分钟间隔采样的 Pandas 时间序列数据帧中的空白,并用新行填充空白的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框,其中包含每隔 1 分钟采样一次的财务数据.有时可能会丢失一两行数据.

I have a data frame containing financial data sampled at 1 minute intervals. Occasionally a row or two of data might be missing.

  • 我正在寻找一种好的(简单而有效的)方法来在数据框中缺少数据的位置插入新行.
  • 除了包含时间戳的索引外,新行可以为空.
 #Example Input---------------------------------------------
                      open     high     low      close
 2019-02-07 16:01:00  124.624  124.627  124.647  124.617  
 2019-02-07 16:04:00  124.646  124.655  124.664  124.645  

 # Desired Ouput--------------------------------------------
                      open     high     low      close
 2019-02-07 16:01:00  124.624  124.627  124.647  124.617  
 2019-02-07 16:02:00  NaN      NaN      NaN      NaN
 2019-02-07 16:03:00  NaN      NaN      NaN      NaN
 2019-02-07 16:04:00  124.646  124.655  124.664  124.645 

我目前的方法基于这篇文章 -在时间序列数据中查找缺失的分钟数据使用熊猫 - 这只是建议如何识别差距.不是如何填充它们.

My current method is based off this post - Find missing minute data in time series data using pandas - which is advises only how to identify the gaps. Not how to fill them.

我正在做的是创建一个 1 分钟间隔的 DateTimeIndex.然后使用这个索引,我创建了一个全新的数据帧,然后可以将其合并到我的原始数据帧中,从而填补空白.代码如下所示.这样做的方式似乎很复杂.我想知道是否有更好的方法.也许是重新采样数据?

What I'm doing is creating a DateTimeIndex of 1min intervals. Then using this index, I create an entirely new dataframe, which can then be merged into my original dataframe thus filling the gaps. Code is shown below. It seems quite a round about way of doing this. I would like to know if there is a better way. Maybe with resampling the data?

import pandas as pd
from datetime import datetime

# Initialise prices dataframe with missing data
prices = pd.DataFrame([[datetime(2019,2,7,16,0),  124.634,  124.624, 124.65,   124.62],[datetime(2019,2,7,16,4), 124.624,  124.627,  124.647,  124.617]])
prices.columns = ['datetime','open','high','low','close']
prices = prices.set_index('datetime')
print(prices)

# Create a new dataframe with complete set of time intervals
idx_ref = pd.DatetimeIndex(start=datetime(2019,2,7,16,0), end=datetime(2019,2,7,16,4),freq='min')
df = pd.DataFrame(index=idx_ref)

# Merge the two dataframes 
prices = pd.merge(df, prices, how='outer', left_index=True, 
right_index=True)
print(prices)

推荐答案

使用 DataFrame.asfreq 使用 Datetimeindex:

prices = prices.set_index('datetime').asfreq('1Min')
print(prices)
                        open     high      low    close
datetime                                               
2019-02-07 16:00:00  124.634  124.624  124.650  124.620
2019-02-07 16:01:00      NaN      NaN      NaN      NaN
2019-02-07 16:02:00      NaN      NaN      NaN      NaN
2019-02-07 16:03:00      NaN      NaN      NaN      NaN
2019-02-07 16:04:00  124.624  124.627  124.647  124.617

这篇关于查找以 1 分钟间隔采样的 Pandas 时间序列数据帧中的空白,并用新行填充空白的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆