在列中输入缺失值和最近邻居的平均值 [英] Input missed values with mean of nearest neighbors in column

查看:100
本文介绍了在列中输入缺失值和最近邻居的平均值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个DataFrame:

I have a DataFrame:

df = pd.DataFrame(data=[676, 0, 670, 0, 668], index=['2012-01-31 00:00:00','2012-02-29 00:00:00',
                                                     '2012-03-31 00:00:00','2012-04-30 00:00:00',
                                                     '2012-05-31 00:00:00'])  
df.index.name = "Date"
df.columns = ["Number"]

外观如下:

              Number
Date    
2012-01-31 00:00:00 676
2012-02-29 00:00:00 0
2012-03-31 00:00:00 670
2012-04-30 00:00:00 0
2012-05-31 00:00:00 668

如何分别用(676 + 670)/2和(670 + 668)/2输入第二和第四值?

How can i input 2nd and 4th values with (676+670)/2 and (670+668)/2 correspondinly?

我可以将值另存为np.array并将其放入数组中,但这很繁琐!

I can save values as np.array and imput them in array, but that's rediculous!

推荐答案

我使用where方法,并指定将任何0替换为np.nan.一旦将0指定为NaN,就可以使用fillna方法.通过使用ffillbfill,我们用相应的先前值和后续值填充所有NaN,将它们相加并除以2.

I use where method and specify to replace any 0 with np.nan. Once we have specified 0 to be NaN we can use fillna method. By using ffill and bfill we fill all NaN with the corresponding previous and proceeding values, add them, and divide by 2.

df.where(df.replace(to_replace=0, value=np.nan),
 other=(df.fillna(method='ffill') + df.fillna(method='bfill'))/2)

                     Number
Date                       
2012-01-31 00:00:00   676.0
2012-02-29 00:00:00   673.0
2012-03-31 00:00:00   670.0
2012-04-30 00:00:00   669.0
2012-05-31 00:00:00   668.0

这篇关于在列中输入缺失值和最近邻居的平均值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆