Python pandas ，如何截断DatetimeIndex，只能在一定的时间间隔内填写丢失的数据 [英] Python pandas, how to truncate DatetimeIndex and fill missing data only in certain interval

查看：800 发布时间：2017/3/26 3:49:39 python datetime dataframe pandas truncate

本文介绍了Python pandas ，如何截断DatetimeIndex，只能在一定的时间间隔内填写丢失的数据的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

  2012-10-08 07:12:22 0.0 0 0 2315.6 0 0.0 0 
 2012-10-08 09:14:00 2306.4 20 326586240 2306.4 472 2306.8 4 
 2012-10-08 09:15:00 2306.8 34 249805440 2306.8 361 2308.0 26 
 2012-10-08 09:15:01 2308.0 1 53309040 2307.4 77 2308.6 9 
 2012-10- 08 09：15：01.500000 2308.2 1 124630140 2307.0 180 2308.4 1 
 2012-10-08 09:15:02 2307.0 5 85846260 2308.2 124 2308.0 9 
 2012-10-08 09：15：02.500000 2307.0 3 128073540 2307.0 185 2307.6 11 
 ...... 
 2012-10-09 07:19:30 0.0 0 0 2276.6 0 0.0 0 
 2012-10-09 09:14:00 2283.2 80 98634240 2283.2 144 2283.4 1 
 2012-10-09 09:15:00 2285.2 18 126814260 2285.2 185 2285.6 3 
 2012-10-09 09:15:01 2285.8 6 98719560 2286.8 144 2287.0 25 
 2012-10-09 09：15：01.500000 2287.0 36 144759420 2288.8 2 11 2289.0 4 
 2012-10-09 09:15:02 2287.4 6 109829280 2287.4 160 2288.6 5 
 ......

我有一个DataFrame包含上述几天的交易数据交易数据。我想要的数据来自 9:00:00 AM - 11:30:00 AM 和 13:00:00 - 15:15： 00 ，所以我想对DataFrame中的每个日期截断只有两个东西，

范围内的数据 9:00:00 AM - 11:30:00 AM 和 13:00:00 - 15:15： 00

，范围为1.，填写缺少的数据，频率为 500毫秒 另外如何填写缺少的数据只是在我感兴趣的时间间隔。

非常感谢。

解决方案

只有数据在9:00:00 AM - 11之间： 30:00 AM和13:00:00 - 15:15:00

使用索引切片，例如：

  df = df [start_timestamp：end_timestamp]

，范围在1.，填写缺少的数据，频率为500毫秒

生成一个新的数据帧，索引为500毫秒。将此数据框与原始数据帧合并使用外部加入。这将定期为您提供一行数据帧。缺少观测值的行将包含NaN值。然后使用 fillna 填写缺少的NaN值。

示例：

 在[1]中：将大熊猫导入为pd 
 
在[2]中：import numpy as np 
 
在[3]中：data = pd.DataFrame（{value：np.arange（5）}，index = pd.date_range（2013/02/03，periods = 5，freq =3Min））
 
在[4]中：数据
输出[4]：
价值
 2013-02-03 00:00:00 0 
 2013-02-03 00:03:00 1 
 2013-02-03 00:06:00 2 
 2013-02-03 00:09:00 3 
 2013-02-03 00:12:00 4 
 
在[5]：fill = pd.DataFrame（{value： [100] * 15}，index = pd.date_range（2013/02/03，periods = 15，freq =1Min））
 
在[6]中：填充
出[6]：
值
 2013-02-03 00:00:00 100 
 2013-02-03 00:01:00 100 
 2013-02-03 00 02：00 100 
 2 013-02-03 00:03:00 100 
 2013-02-03 00:04:00 100 
 2013-02-03 00:05:00 100 
 2013-02-03 00:06:00 100 
 2013-02-03 00:07:00 100 
 2013-02-03 00:08:00 100 
 2013-02-03 00:09:00 100 
 2013-02-03 00:10:00 100 
 2013-02-03 00:11:00 100 
 2013-02-03 00:12:00 100 
 2013-02-03 00:13:00 100 
 2013-02-03 00:14:00 100 
 
在[7]中：merged = filler.merge（data，how =' left]，left_index = True，right_index = True）
 
在[8]中：合并[value] = np.where（np.isfinite（merged.value_y），merged.value_y，合并。 value_x）
 
在[9]中：合并
输出[9]：
 value_x value_y值
 2013-02-03 00:00:00 100 0 0 
 2013-02-03 00:01:00 100 NaN 100 
 2013-02-03 00:02:00 100 NaN 100 
 2013-02-03 00:03:00 100 1 1 
 2013-02-03 00:04:00 100 NaN 100 
 2013-02-03 00:05： 00 100 NaN 100 
 2013-02-03 00:06:00 100 2 2 
 2013-02-03 00:07:00 100 NaN 100 
 2013-02-03 00:08 ：00 100 NaN 100 
 2013-02-03 00:09:00 100 3 3 
 2013-02-03 00:10:00 100 NaN 100 
 2013-02-03 00： 11:00 100 NaN 100 
 2013-02-03 00:12:00 100 4 4 
 2013-02-03 00:13:00 100 NaN 100 
 2013-02-03 00 ：14：00 100 NaN 100 
 
在[10]中：合并['2013-02-03 00:01:00'：'2013-02-03 00:10:00'] 
出[10]：
 value_x value_y值
 2013-02-03 00:01:00 100 NaN 100 
 2013-02-03 00:02:00 100 NaN 100 
 20 13-02-03 00:03:00 100 1 1 
 2013-02-03 00:04:00 100 NaN 100 
 2013-02-03 00:05:00 100 NaN 100 
 2013-02-03 00:06:00 100 2 2 
 2013-02-03 00:07:00 100 NaN 100 
 2013-02-03 00:08:00 100 NaN 100 
 2013-02-03 00:09:00 100 3 3 
 2013-02-03 00:10:00 100 NaN 100

 2012-10-08 07:12:22            0.0    0          0  2315.6    0     0.0    0
 2012-10-08 09:14:00         2306.4   20  326586240  2306.4  472  2306.8    4
 2012-10-08 09:15:00         2306.8   34  249805440  2306.8  361  2308.0   26
 2012-10-08 09:15:01         2308.0    1   53309040  2307.4   77  2308.6    9
 2012-10-08 09:15:01.500000  2308.2    1  124630140  2307.0  180  2308.4    1
 2012-10-08 09:15:02         2307.0    5   85846260  2308.2  124  2308.0    9
 2012-10-08 09:15:02.500000  2307.0    3  128073540  2307.0  185  2307.6   11
 ......
 2012-10-09 07:19:30            0.0    0          0  2276.6    0     0.0    0
 2012-10-09 09:14:00         2283.2   80   98634240  2283.2  144  2283.4    1
 2012-10-09 09:15:00         2285.2   18  126814260  2285.2  185  2285.6    3
 2012-10-09 09:15:01         2285.8    6   98719560  2286.8  144  2287.0   25
 2012-10-09 09:15:01.500000  2287.0   36  144759420  2288.8  211  2289.0    4
 2012-10-09 09:15:02         2287.4    6  109829280  2287.4  160  2288.6    5
 ......

I have a DataFrame contains several days of exchange trading data as above. The the data I want to have is from 9:00:00AM - 11:30:00AM and 13:00:00 - 15:15:00, so I would like to do two things,

for each date in the DataFrame truncate to only have data in the range of 9:00:00AM - 11:30:00AM and 13:00:00 - 15:15:00
with the range in 1., fill missing data with a frequency of 500 milliseconds

the pandas truncate functions only allows me to truncate according to date, but I would like to truncate according to datetime.time here. Also how to fill the missing data only for the interval I am interested.

Thanks a lot.

解决方案

for each date in the DataFrame truncate to only have data in the range of 9:00:00AM - 11:30:00AM and 13:00:00 - 15:15:00

Use index slicing for that, e.g.:

df = df[start_timestamp:end_timestamp]

with the range in 1., fill missing data with a frequency of 500 milliseconds

Generate a new dataframe with an index at 500 msec. Merge this dataframe with the original one using outer join. This gets you a dataframe with rows at regular intervals. Rows for missing observations will contain NaN values. Then fill missing NaN values with fillna.

Example:

In [1]: import pandas as pd

In [2]: import numpy as np

In [3]: data = pd.DataFrame({"value": np.arange(5)}, index=pd.date_range("2013/02/03", periods=5, freq="3Min"))

In [4]: data
Out[4]: 
                     value
2013-02-03 00:00:00      0
2013-02-03 00:03:00      1
2013-02-03 00:06:00      2
2013-02-03 00:09:00      3
2013-02-03 00:12:00      4

In [5]: filler = pd.DataFrame({"value": [100] * 15}, index=pd.date_range("2013/02/03", periods=15, freq="1Min"))                                                                           

In [6]: filler
Out[6]: 
                     value
2013-02-03 00:00:00    100
2013-02-03 00:01:00    100
2013-02-03 00:02:00    100
2013-02-03 00:03:00    100
2013-02-03 00:04:00    100
2013-02-03 00:05:00    100
2013-02-03 00:06:00    100
2013-02-03 00:07:00    100
2013-02-03 00:08:00    100
2013-02-03 00:09:00    100
2013-02-03 00:10:00    100
2013-02-03 00:11:00    100
2013-02-03 00:12:00    100
2013-02-03 00:13:00    100
2013-02-03 00:14:00    100

In [7]: merged = filler.merge(data, how='left', left_index=True, right_index=True)                                                                                                         

In [8]: merged["value"] = np.where(np.isfinite(merged.value_y), merged.value_y, merged.value_x)                                                                                            

In [9]: merged
Out[9]: 
                     value_x  value_y  value
2013-02-03 00:00:00      100        0      0
2013-02-03 00:01:00      100      NaN    100
2013-02-03 00:02:00      100      NaN    100
2013-02-03 00:03:00      100        1      1
2013-02-03 00:04:00      100      NaN    100
2013-02-03 00:05:00      100      NaN    100
2013-02-03 00:06:00      100        2      2
2013-02-03 00:07:00      100      NaN    100
2013-02-03 00:08:00      100      NaN    100
2013-02-03 00:09:00      100        3      3
2013-02-03 00:10:00      100      NaN    100
2013-02-03 00:11:00      100      NaN    100
2013-02-03 00:12:00      100        4      4
2013-02-03 00:13:00      100      NaN    100
2013-02-03 00:14:00      100      NaN    100

In [10]: merged['2013-02-03 00:01:00':'2013-02-03 00:10:00']                                                                                                                                
Out[10]: 
                     value_x  value_y  value
2013-02-03 00:01:00      100      NaN    100
2013-02-03 00:02:00      100      NaN    100
2013-02-03 00:03:00      100        1      1
2013-02-03 00:04:00      100      NaN    100
2013-02-03 00:05:00      100      NaN    100
2013-02-03 00:06:00      100        2      2
2013-02-03 00:07:00      100      NaN    100
2013-02-03 00:08:00      100      NaN    100
2013-02-03 00:09:00      100        3      3
2013-02-03 00:10:00      100      NaN    100

这篇关于Python pandas ，如何截断DatetimeIndex，只能在一定的时间间隔内填写丢失的数据的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Python pandas ，如何截断DatetimeIndex，只能在一定的时间间隔内填写丢失的数据 [英] Python pandas, how to truncate DatetimeIndex and fill missing data only in certain interval

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Python pandas ，如何截断DatetimeIndex，只能在一定的时间间隔内填写丢失的数据 [英] Python pandas, how to truncate DatetimeIndex and fill missing data only in certain interval

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭