pandas 中缺少数据框的样本 [英] Missing samples of a dataframe in pandas
问题描述
我的df:
In [163]: df.head()
Out[163]:
x-axis y-axis z-axis
time
2017-07-27 06:23:08 -0.107666 -0.068848 0.963623
2017-07-27 06:23:08 -0.105225 -0.070068 0.963867
.....
我将索引设置为日期时间。由于采样率(10 Hz)在数据帧中并不总是恒定的,因此在一秒钟内我有8或9个采样。
I set the index as datetime. Since the sampling rate (10 Hz) is not always constant in the dataframe and for some second I have 8 or 9 samples.
- 我会我想指定数据时间的毫秒数(06:23:08 **。100 **,06:23:08 **。200 **等)
- 我也愿意喜欢对缺失的样本进行插值。
有些想法是如何在大熊猫中做到的?
Some ideas how to do it in pandas?
推荐答案
首先让我们创建一些可能类似于您的数据的示例数据。
First lets create some sample data which maybe resembles your data.
import pandas as pd
from datetime import timedelta
from datetime import datetime
base = datetime.now()
date_list = [base - timedelta(days=x) for x in range(0, 2)]
values = [v for v in range(2)]
df = pd.DataFrame.from_dict({'Date': date_list, 'values': values})
df = df.set_index('Date')
df
values
Date
2017-08-18 20:42:08.563878 0
2017-08-17 20:42:08.563878 1
现在,我们将每100毫秒数据点创建一个数据框。
Now we will create another data frame with every 100 milliseconds of datapoint.
min_val = df.index.min()
max_val = df.index.max()
all_val = []
while min_val <= max_val:
all_val.append(min_val)
min_val += timedelta(milliseconds=100)
# len(all_val) 864001
df_new = pd.DataFrame.from_dict({'Date': all_val})
df_new = df_new.set_index('Date')
lets同时加入两个数据帧,因此所有丢失的行都将具有索引但没有值。
lets join both data frame so all missing rows will have index but no values.
final_df = df_new.join(df)
final_df
values
Date
2017-08-17 20:42:08.563878 1.0
2017-08-17 20:42:08.663878 NaN
2017-08-17 20:42:08.763878 NaN
2017-08-17 20:42:08.863878 NaN
2017-08-17 20:42:08.963878 NaN
2017-08-17 20:42:09.063878 NaN
2017-08-17 20:42:09.163878 NaN
现在插值e数据:
df_final.interpolate()
values
Date
2017-08-17 20:42:08.563878 1.000000
2017-08-17 20:42:08.663878 0.999999
2017-08-17 20:42:08.763878 0.999998
2017-08-17 20:42:08.863878 0.999997
2017-08-17 20:42:08.963878 0.999995
2017-08-17 20:42:09.063878 0.999994
2017-08-17 20:42:09.163878 0.999993
2017-08-17 20:42:09.263878 0.999992
某些插值策略:< a href = https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.interpolate.html rel = nofollow noreferrer> https://pandas.pydata.org/pandas-docs /stable/generation/pandas.DataFrame.interpolate.html
更新:根据评论中的讨论:
UPDATE: As per the discussion in comments:
说我们的初始数据没有毫秒信息。
say our initial data does not have millisecond information.
df_new_date_without_miliseconds = df_new['Date']
df_new_date_without_miliseconds[0] # Timestamp('2017-08-17 21:45:49')
max_value_date = df_new_date_without_miliseconds[0]
max_value_miliseconds = df_new_date_without_miliseconds[0]
updated_dates = []
for val in df_new_date_without_miliseconds:
if val == max_value_date:
val = max_value_miliseconds + timedelta(milliseconds=100)
max_value_miliseconds = val
elif val > max_value_date:
max_value_date = val + timedelta(milliseconds=0)
max_value_miliseconds = val
updated_dates.append(val)
output:
[Timestamp('2017-08-17 21:45:49.100000'),
Timestamp('2017-08-17 21:45:49.200000'),
Timestamp('2017-08-17 21:45:49.300000'),
Timestamp('2017-08-17 21:45:50'),
Timestamp('2017-08-17 21:45:50.100000'),
将新值分配给DataFrame
Assign the new values to the DataFrame
df_new['Date'] = updated_dates
这篇关于 pandas 中缺少数据框的样本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!