pandas :插值,其中列中的第一个和最后一个数据点为NaN [英] Pandas: interpolation where first and last data point in column is NaN

查看:137
本文介绍了 pandas :插值,其中列中的第一个和最后一个数据点为NaN的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用插值函数,但仅在pandas DataFrame列中的已知数据值之间使用.问题是该列中的第一个和最后一个值通常是NaN,有时在值不是NaN之前它可以是很多行:

I would like to use the interpolate function, but only between known data values in a pandas DataFrame column. The issue is that the first and last values in the column are often NaN and sometimes it can be many rows before a value is not NaN:

      col 1    col 2
 0    NaN      NaN
 1    NaN      NaN
...
1000   1       NaN
1001  NaN       1   <-----
1002   3       NaN  <----- only want to fill in these 'in between value' rows
1003   4        3
...
3999  NaN      NaN
4000  NaN      NaN

我将一个数据集捆绑在一起,该数据集在事件中"更新,但每列分别更新,并通过时间戳进行索引.这意味着经常有一些行,其中某些列未记录任何数据,因此有很多NaN!

I am tying together a dataset which is updated 'on event' but separately for each column, and is indexed via Timestamp. This means that there are often rows where no data is recorded for some columns, hence a lot of NaNs!

推荐答案

我通过功能

I select by min and max value of column by function idxmin and idxmax and use function fillna with method forward filling.

print df
#      col 1  col 2
#0       NaN    NaN
#1       NaN    NaN
#1000      1    NaN
#1001    NaN      1
#1002      3    NaN
#1003      4      3
#3999    NaN    NaN
#4000    NaN    NaN

df.loc[df['col 1'].idxmin(): df['col 1'].idxmax()] = df.loc[df['col 1'].idxmin(): df['col 1'].idxmax()].fillna(method='ffill')
df.loc[df['col 2'].idxmin(): df['col 2'].idxmax()] = df.loc[df['col 2'].idxmin(): df['col 2'].idxmax()].fillna(method='ffill')
print df
#      col 1  col 2
#0       NaN    NaN
#1       NaN    NaN
#1000      1    NaN
#1001      1      1
#1002      3      1
#1003      4      3
#3999    NaN    NaN
#4000    NaN    NaN

添加了其他解决方案,谢谢 HStro .

Added different solution, thanks HStro.

df['col 1'].loc[df['col 1'].first_valid_index() : df['col 1'].last_valid_index()] = df['col 1'].loc[df['col 1'].first_valid_index(): df['col 1'].last_valid_index()].astype(float).interpolate()

这篇关于 pandas :插值,其中列中的第一个和最后一个数据点为NaN的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆