数据帧中的NaN:当时间序列的首次观察为NaN时,先填充第一个可用的,否则继续进行上一个/先前的观察 [英] NaN in data frame: when first observation of time series is NaN, frontfill with first available, otherwise carry over last / previous observation

查看:192
本文介绍了数据帧中的NaN:当时间序列的首次观察为NaN时,先填充第一个可用的,否则继续进行上一个/先前的观察的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在根据statsmodels执行ADF测试.该值系列可能缺少遗忘之处.实际上,如果NaN的分数大于c,我将放弃分析.但是,如果该系列解决了所有问题,则adfuller无法处理丢失的数据.由于这是具有最小帧大小的训练数据,因此我想这样做:

I am performing an ADF-test from statsmodels. The value series can have missing obversations. In fact, I am dropping the analysis if the fraction of NaNs is larger than c. However, if the series makes it through the I get the problem, that the adfuller cannot deal with missing data. Since this is training data with a minimum framesize, I would like to do:

1)如果x(t = 0)= NaN,则找到下一个非NaN值(t> 0) 2)否则,如果x(t)= NaN,则x(t)= x(t-1)

1) if x(t=0) = NaN, then find the next non-NaN value (t>0) 2) otherwise if x(t) = NaN, then x(t) = x(t-1)

因此,我在这里损害了我的第一个价值,但要确保输入数据始终具有相同的维数.另外,如果使用dropna的limit选项,我可以用0填充第一个值.

So I am compromising here my first value, but making sure the input data has always the same dimension. Alternatively, I could fill if the first value is missing with 0 making use of the limit option from dropna.

从文档中,我对100%的其他选项不清楚: 方法:{'backfill','bfill','pad','ffill',None},默认为None

From the documentation the different option are not 100% clear to me: method : {‘backfill’, ‘bfill’, ‘pad’, ‘ffill’, None}, default None

用于在重新索引的系列填充板/填充中填充孔的方法:将最后一个有效观察向前传播到下一个有效回填/ 填充:使用NEXT有效观察值来填补空白

Method to use for filling holes in reindexed Series pad / ffill: propagate last valid observation forward to next valid backfill / bfill: use NEXT valid observation to fill gap

填充/填充:这是否意味着我保留了之前的值? 回填/填充:这是否表示我将来会从有效值中获取该值?

pad / ffill: does that mean I carry over the previous value? backfill / bfill: does that mean I the value is taken from a valid one in the future?

df.dropna(method = 'bfill', limit 1, inplace = True)
df.dropna(method = 'ffill', inplace = True)

那会不会有限制?该文档使用限制= 1",但预先确定了要填充的值.

Would that work with limit? The documentation uses 'limit = 1' but has predetermined a value to be filled.

推荐答案

1)如果x(t = 0)= NaN,则找到下一个非NaN值(t> 0)2)否则,如果x(t)= NaN,则x(t)= x(t-1 )

1) if x(t=0) = NaN, then find the next non-NaN value (t>0) 2) otherwise if x(t) = NaN, then x(t) = x(t-1)

要预先填充所有(除了可能要填充的)第一个观察值以外的所有观察值,可以将两个调用链接到

To front-fill all observations except for (possibly) the first ones, which should be backfilled, you can chain two calls to fillna, the first with method='ffill' and the second with method='fill':

df = pd.DataFrame({'a': [None, None, 1, None, 2, None]})
>>> df.fillna(method='ffill').fillna(method='bfill')
    a
0   1.0
1   1.0
2   1.0
3   1.0
4   2.0
5   2.0

这篇关于数据帧中的NaN:当时间序列的首次观察为NaN时,先填充第一个可用的,否则继续进行上一个/先前的观察的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆