Python pandas ,如何截断DatetimeIndex,只能在一定的时间间隔内填写丢失的数据 [英] Python pandas, how to truncate DatetimeIndex and fill missing data only in certain interval
问题描述
2012-10-08 07:12:22 0.0 0 0 2315.6 0 0.0 0
2012-10-08 09:14:00 2306.4 20 326586240 2306.4 472 2306.8 4
2012-10-08 09:15:00 2306.8 34 249805440 2306.8 361 2308.0 26
2012-10-08 09:15:01 2308.0 1 53309040 2307.4 77 2308.6 9
2012-10- 08 09:15:01.500000 2308.2 1 124630140 2307.0 180 2308.4 1
2012-10-08 09:15:02 2307.0 5 85846260 2308.2 124 2308.0 9
2012-10-08 09:15:02.500000 2307.0 3 128073540 2307.0 185 2307.6 11
......
2012-10-09 07:19:30 0.0 0 0 2276.6 0 0.0 0
2012-10-09 09:14:00 2283.2 80 98634240 2283.2 144 2283.4 1
2012-10-09 09:15:00 2285.2 18 126814260 2285.2 185 2285.6 3
2012-10-09 09:15:01 2285.8 6 98719560 2286.8 144 2287.0 25
2012-10-09 09:15:01.500000 2287.0 36 144759420 2288.8 2 11 2289.0 4
2012-10-09 09:15:02 2287.4 6 109829280 2287.4 160 2288.6 5
......
我有一个DataFrame包含上述几天的交易数据交易数据。我想要的数据来自 9:00:00 AM - 11:30:00 AM
和 13:00:00 - 15:15: 00
,所以我想对DataFrame中的每个日期截断只有两个东西,
-
范围内的数据9:00:00 AM - 11:30:00 AM
和13:00:00 - 15:15: 00
- ,范围为1.,填写缺少的数据,频率为
500毫秒
这个pandas truncate函数只允许我根据日期截断,但是我想根据datetime.time在这里截断。
另外如何填写缺少的数据只是在我感兴趣的时间间隔。
非常感谢。
解决方案
- 只有数据在9:00:00 AM - 11之间: 30:00 AM和13:00:00 - 15:15:00
使用索引切片,例如:
df = df [start_timestamp:end_timestamp]
- ,范围在1.,填写缺少的数据,频率为500毫秒
生成一个新的数据帧,索引为500毫秒。 将此数据框与原始数据帧合并使用外部加入。这将定期为您提供一行数据帧。缺少观测值的行将包含NaN值。然后使用 fillna 填写缺少的NaN值。
示例:
在[1]中:将大熊猫导入为pd
在[2]中:import numpy as np
在[3]中:data = pd.DataFrame({value:np.arange(5)},index = pd.date_range(2013/02/03,periods = 5,freq =3Min))
在[4]中:数据
输出[4]:
价值
2013-02-03 00:00:00 0
2013-02-03 00:03:00 1
2013-02-03 00:06:00 2
2013-02-03 00:09:00 3
2013-02-03 00:12:00 4
在[5]:fill = pd.DataFrame({value: [100] * 15},index = pd.date_range(2013/02/03,periods = 15,freq =1Min))
在[6]中:填充
出[6]:
值
2013-02-03 00:00:00 100
2013-02-03 00:01:00 100
2013-02-03 00 02:00 100
2 013-02-03 00:03:00 100
2013-02-03 00:04:00 100
2013-02-03 00:05:00 100
2013-02-03 00:06:00 100
2013-02-03 00:07:00 100
2013-02-03 00:08:00 100
2013-02-03 00:09:00 100
2013-02-03 00:10:00 100
2013-02-03 00:11:00 100
2013-02-03 00:12:00 100
2013-02-03 00:13:00 100
2013-02-03 00:14:00 100
在[7]中:merged = filler.merge(data,how =' left],left_index = True,right_index = True)
在[8]中:合并[value] = np.where(np.isfinite(merged.value_y),merged.value_y,合并。 value_x)
在[9]中:合并
输出[9]:
value_x value_y值
2013-02-03 00:00:00 100 0 0
2013-02-03 00:01:00 100 NaN 100
2013-02-03 00:02:00 100 NaN 100
2013-02-03 00:03:00 100 1 1
2013-02-03 00:04:00 100 NaN 100
2013-02-03 00:05: 00 100 NaN 100
2013-02-03 00:06:00 100 2 2
2013-02-03 00:07:00 100 NaN 100
2013-02-03 00:08 :00 100 NaN 100
2013-02-03 00:09:00 100 3 3
2013-02-03 00:10:00 100 NaN 100
2013-02-03 00: 11:00 100 NaN 100
2013-02-03 00:12:00 100 4 4
2013-02-03 00:13:00 100 NaN 100
2013-02-03 00 :14:00 100 NaN 100
在[10]中:合并['2013-02-03 00:01:00':'2013-02-03 00:10:00']
出[10]:
value_x value_y值
2013-02-03 00:01:00 100 NaN 100
2013-02-03 00:02:00 100 NaN 100
20 13-02-03 00:03:00 100 1 1
2013-02-03 00:04:00 100 NaN 100
2013-02-03 00:05:00 100 NaN 100
2013-02-03 00:06:00 100 2 2
2013-02-03 00:07:00 100 NaN 100
2013-02-03 00:08:00 100 NaN 100
2013-02-03 00:09:00 100 3 3
2013-02-03 00:10:00 100 NaN 100
2012-10-08 07:12:22 0.0 0 0 2315.6 0 0.0 0
2012-10-08 09:14:00 2306.4 20 326586240 2306.4 472 2306.8 4
2012-10-08 09:15:00 2306.8 34 249805440 2306.8 361 2308.0 26
2012-10-08 09:15:01 2308.0 1 53309040 2307.4 77 2308.6 9
2012-10-08 09:15:01.500000 2308.2 1 124630140 2307.0 180 2308.4 1
2012-10-08 09:15:02 2307.0 5 85846260 2308.2 124 2308.0 9
2012-10-08 09:15:02.500000 2307.0 3 128073540 2307.0 185 2307.6 11
......
2012-10-09 07:19:30 0.0 0 0 2276.6 0 0.0 0
2012-10-09 09:14:00 2283.2 80 98634240 2283.2 144 2283.4 1
2012-10-09 09:15:00 2285.2 18 126814260 2285.2 185 2285.6 3
2012-10-09 09:15:01 2285.8 6 98719560 2286.8 144 2287.0 25
2012-10-09 09:15:01.500000 2287.0 36 144759420 2288.8 211 2289.0 4
2012-10-09 09:15:02 2287.4 6 109829280 2287.4 160 2288.6 5
......
I have a DataFrame contains several days of exchange trading data as above. The the data I want to have is from 9:00:00AM - 11:30:00AM
and 13:00:00 - 15:15:00
, so I would like to do two things,
- for each date in the DataFrame truncate to only have data in the
range of
9:00:00AM - 11:30:00AM
and13:00:00 - 15:15:00
- with the range in 1., fill missing data with a frequency of
500 milliseconds
the pandas truncate functions only allows me to truncate according to date, but I would like to truncate according to datetime.time here. Also how to fill the missing data only for the interval I am interested.
Thanks a lot.
- for each date in the DataFrame truncate to only have data in the range of 9:00:00AM - 11:30:00AM and 13:00:00 - 15:15:00
Use index slicing for that, e.g.:
df = df[start_timestamp:end_timestamp]
- with the range in 1., fill missing data with a frequency of 500 milliseconds
Generate a new dataframe with an index at 500 msec. Merge this dataframe with the original one using outer join. This gets you a dataframe with rows at regular intervals. Rows for missing observations will contain NaN values. Then fill missing NaN values with fillna.
Example:
In [1]: import pandas as pd
In [2]: import numpy as np
In [3]: data = pd.DataFrame({"value": np.arange(5)}, index=pd.date_range("2013/02/03", periods=5, freq="3Min"))
In [4]: data
Out[4]:
value
2013-02-03 00:00:00 0
2013-02-03 00:03:00 1
2013-02-03 00:06:00 2
2013-02-03 00:09:00 3
2013-02-03 00:12:00 4
In [5]: filler = pd.DataFrame({"value": [100] * 15}, index=pd.date_range("2013/02/03", periods=15, freq="1Min"))
In [6]: filler
Out[6]:
value
2013-02-03 00:00:00 100
2013-02-03 00:01:00 100
2013-02-03 00:02:00 100
2013-02-03 00:03:00 100
2013-02-03 00:04:00 100
2013-02-03 00:05:00 100
2013-02-03 00:06:00 100
2013-02-03 00:07:00 100
2013-02-03 00:08:00 100
2013-02-03 00:09:00 100
2013-02-03 00:10:00 100
2013-02-03 00:11:00 100
2013-02-03 00:12:00 100
2013-02-03 00:13:00 100
2013-02-03 00:14:00 100
In [7]: merged = filler.merge(data, how='left', left_index=True, right_index=True)
In [8]: merged["value"] = np.where(np.isfinite(merged.value_y), merged.value_y, merged.value_x)
In [9]: merged
Out[9]:
value_x value_y value
2013-02-03 00:00:00 100 0 0
2013-02-03 00:01:00 100 NaN 100
2013-02-03 00:02:00 100 NaN 100
2013-02-03 00:03:00 100 1 1
2013-02-03 00:04:00 100 NaN 100
2013-02-03 00:05:00 100 NaN 100
2013-02-03 00:06:00 100 2 2
2013-02-03 00:07:00 100 NaN 100
2013-02-03 00:08:00 100 NaN 100
2013-02-03 00:09:00 100 3 3
2013-02-03 00:10:00 100 NaN 100
2013-02-03 00:11:00 100 NaN 100
2013-02-03 00:12:00 100 4 4
2013-02-03 00:13:00 100 NaN 100
2013-02-03 00:14:00 100 NaN 100
In [10]: merged['2013-02-03 00:01:00':'2013-02-03 00:10:00']
Out[10]:
value_x value_y value
2013-02-03 00:01:00 100 NaN 100
2013-02-03 00:02:00 100 NaN 100
2013-02-03 00:03:00 100 1 1
2013-02-03 00:04:00 100 NaN 100
2013-02-03 00:05:00 100 NaN 100
2013-02-03 00:06:00 100 2 2
2013-02-03 00:07:00 100 NaN 100
2013-02-03 00:08:00 100 NaN 100
2013-02-03 00:09:00 100 3 3
2013-02-03 00:10:00 100 NaN 100
这篇关于Python pandas ,如何截断DatetimeIndex,只能在一定的时间间隔内填写丢失的数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!