pandas 中缺少数据框的样本 [英] Missing samples of a dataframe in pandas

查看:92
本文介绍了 pandas 中缺少数据框的样本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的df:

In [163]: df.head()
Out[163]: 
                       x-axis    y-axis    z-axis
time   
2017-07-27 06:23:08 -0.107666 -0.068848  0.963623
2017-07-27 06:23:08 -0.105225 -0.070068  0.963867
.....

我将索引设置为日期时间。由于采样率(10 Hz)在数据帧中并不总是恒定的,因此在一秒钟内我有8或9个采样。

I set the index as datetime. Since the sampling rate (10 Hz) is not always constant in the dataframe and for some second I have 8 or 9 samples.


  1. 我会我想指定数据时间的毫秒数(06:23:08 **。100 **,06:23:08 **。200 **等)

  2. 我也愿意喜欢对缺失的样本进行插值。

有些想法是如何在大熊猫中做到的?

Some ideas how to do it in pandas?

推荐答案

首先让我们创建一些可能类似于您的数据的示例数据。

First lets create some sample data which maybe resembles your data.

import pandas as pd
from datetime import timedelta
from datetime import datetime

base = datetime.now()
date_list = [base - timedelta(days=x) for x in range(0, 2)]
values = [v for v in range(2)]
df = pd.DataFrame.from_dict({'Date': date_list, 'values': values})

df = df.set_index('Date')
df

                           values
Date    
2017-08-18 20:42:08.563878  0
2017-08-17 20:42:08.563878  1

现在,我们将每100毫秒数据点创建一个数据框。

Now we will create another data frame with every 100 milliseconds of datapoint.

min_val = df.index.min()
max_val = df.index.max()

all_val = []
while min_val <= max_val:
    all_val.append(min_val)
    min_val += timedelta(milliseconds=100)
# len(all_val) 864001 
df_new = pd.DataFrame.from_dict({'Date': all_val})
df_new = df_new.set_index('Date')



lets同时加入两个数据帧,因此所有丢失的行都将具有索引但没有值。



lets join both data frame so all missing rows will have index but no values.

final_df = df_new.join(df)
final_df

                            values
Date    
2017-08-17 20:42:08.563878  1.0
2017-08-17 20:42:08.663878  NaN
2017-08-17 20:42:08.763878  NaN
2017-08-17 20:42:08.863878  NaN
2017-08-17 20:42:08.963878  NaN
2017-08-17 20:42:09.063878  NaN
2017-08-17 20:42:09.163878  NaN

现在插值e数据:

df_final.interpolate()

                            values
Date    
2017-08-17 20:42:08.563878  1.000000
2017-08-17 20:42:08.663878  0.999999
2017-08-17 20:42:08.763878  0.999998
2017-08-17 20:42:08.863878  0.999997
2017-08-17 20:42:08.963878  0.999995
2017-08-17 20:42:09.063878  0.999994
2017-08-17 20:42:09.163878  0.999993
2017-08-17 20:42:09.263878  0.999992

某些插值策略:< a href = https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.interpolate.html rel = nofollow noreferrer> https://pandas.pydata.org/pandas-docs /stable/generation/pandas.DataFrame.interpolate.html

更新:根据评论中的讨论:

UPDATE: As per the discussion in comments:

说我们的初始数据没有毫秒信息。

say our initial data does not have millisecond information.

df_new_date_without_miliseconds = df_new['Date']
df_new_date_without_miliseconds[0] # Timestamp('2017-08-17 21:45:49')

max_value_date = df_new_date_without_miliseconds[0]
max_value_miliseconds = df_new_date_without_miliseconds[0]

updated_dates = []
for val in df_new_date_without_miliseconds:
    if val == max_value_date:
        val = max_value_miliseconds + timedelta(milliseconds=100)
        max_value_miliseconds = val
    elif val > max_value_date:
        max_value_date = val + timedelta(milliseconds=0)
        max_value_miliseconds = val
    updated_dates.append(val)

output:

[Timestamp('2017-08-17 21:45:49.100000'),
 Timestamp('2017-08-17 21:45:49.200000'),
 Timestamp('2017-08-17 21:45:49.300000'),
 Timestamp('2017-08-17 21:45:50'),
 Timestamp('2017-08-17 21:45:50.100000'),

将新值分配给DataFrame

Assign the new values to the DataFrame

df_new['Date'] = updated_dates

这篇关于 pandas 中缺少数据框的样本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆