当缺少多日数据时,用NaN填充数据框 [英] fill dataframe with NaN when multiple days data is missing

查看:97
本文介绍了当缺少多日数据时,用NaN填充数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个熊猫数据框,可以对其进行插值以获得每日数据框.原始数据框如下所示:

I have a pandas dataframe which I interpolate to get a daily dataframe. The original dataframe looks like this:

               col_1      vals 
2017-10-01  0.000000  0.112869 
2017-10-02  0.017143  0.112869 
2017-10-12  0.003750  0.117274 
2017-10-14  0.000000  0.161556 
2017-10-17  0.000000  0.116264   

在插值数据框中,我想将日期间隔超过5天的数据值更改为NaN.例如.在上面的数据框中,2017-10-022017-10-12之间的间隔超过5天,因此在插值数据框中,应删除这两个日期之间的所有值.我不确定如何执行此操作,也许是combine_first?

In the interpolated dataframe, I want to change data values to NaN where the gap in dates exceeds 5 days. E.g. in the dataframe above, the gap between 2017-10-02 and 2017-10-12 exceeds 5 days therefore in the interpolated dataframe all values between these 2 dates should be removed. I am not sure how to do this, maybe combine_first?

-内插数据帧如下所示:

-- Interpolated dataframe looks like so:

            col_1      vals 
2017-10-01  0.000000  0.112869 
2017-10-02  0.017143  0.112869 
2017-10-03  0.015804  0.113309 
2017-10-04  0.014464  0.113750 
2017-10-05  0.013125  0.114190 
2017-10-06  0.011786  0.114631 
2017-10-07  0.010446  0.115071 
2017-10-08  0.009107  0.115512 
2017-10-09  0.007768  0.115953 
2017-10-10  0.006429  0.116393 
2017-10-11  0.005089  0.116834 
2017-10-12  0.003750  0.117274 
2017-10-13  0.001875  0.139415 
2017-10-14  0.000000  0.161556 
2017-10-15  0.000000  0.146459 
2017-10-16  0.000000  0.131361 
2017-10-17  0.000000  0.116264

预期输出:

               col_1      vals
2017-10-01  0.000000  0.112869
2017-10-02  0.017143  0.112869
2017-10-12  0.003750  0.117274
2017-10-13  0.001875  0.139415
2017-10-14  0.000000  0.161556
2017-10-15  0.000000  0.146459
2017-10-16  0.000000  0.131361
2017-10-17  0.000000  0.116264

推荐答案

我首先确定差距超过5天的地方.从那里,我生成了一个数组,用于标识这些间隙之间的组.最后,我将使用groupby转到每日频率并进行插值.

I'd first identify where the gaps exceeded 5 days. From there, I generate an array that identified groups between such gaps. Finally, I'd use groupby to turn to daily frequency and interpolate.

# convenience: assign string to variable for easier access
daytype = 'timedelta64[D]'

# define five days for use when evaluating size of gaps
five = np.array(5, dtype=daytype)

# get the size of gaps
deltas = np.diff(df.index.values).astype(daytype)

# identify groups between gaps
groups = np.append(False, deltas > five).cumsum()

# handy function to turn to daily frequency and interpolate
to_daily = lambda x: x.asfreq('D').interpolate()

# and finally...
df.groupby(groups, group_keys=False).apply(to_daily)

               col_1      vals
2017-10-01  0.000000  0.112869
2017-10-02  0.017143  0.112869
2017-10-12  0.003750  0.117274
2017-10-13  0.001875  0.139415
2017-10-14  0.000000  0.161556
2017-10-15  0.000000  0.146459
2017-10-16  0.000000  0.131361
2017-10-17  0.000000  0.116264


如果要提供自己的插值方法.您可以像这样修改上面的内容:


In the event you want to provide your own interpolation method. You can modify the above like this:

daytype = 'timedelta64[D]'
five = np.array(5, dtype=daytype)
deltas = np.diff(df.index.values).astype(daytype)
groups = np.append(False, deltas > five).cumsum()

# custom interpolation function that takes a dataframe
def my_interpolate(df):
    """This can be whatever you want.
    I just provided what will result
    in the same thing as before."""
    return df.interpolate()

to_daily = lambda x: x.asfreq('D').pipe(my_interpolate)

df.groupby(groups, group_keys=False).apply(to_daily)

               col_1      vals
2017-10-01  0.000000  0.112869
2017-10-02  0.017143  0.112869
2017-10-12  0.003750  0.117274
2017-10-13  0.001875  0.139415
2017-10-14  0.000000  0.161556
2017-10-15  0.000000  0.146459
2017-10-16  0.000000  0.131361
2017-10-17  0.000000  0.116264

这篇关于当缺少多日数据时,用NaN填充数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆