在列中输入缺失值和最近邻居的平均值 [英] Input missed values with mean of nearest neighbors in column
本文介绍了在列中输入缺失值和最近邻居的平均值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一个DataFrame:
I have a DataFrame:
df = pd.DataFrame(data=[676, 0, 670, 0, 668], index=['2012-01-31 00:00:00','2012-02-29 00:00:00',
'2012-03-31 00:00:00','2012-04-30 00:00:00',
'2012-05-31 00:00:00'])
df.index.name = "Date"
df.columns = ["Number"]
外观如下:
Number
Date
2012-01-31 00:00:00 676
2012-02-29 00:00:00 0
2012-03-31 00:00:00 670
2012-04-30 00:00:00 0
2012-05-31 00:00:00 668
如何分别用(676 + 670)/2和(670 + 668)/2输入第二和第四值?
How can i input 2nd and 4th values with (676+670)/2 and (670+668)/2 correspondinly?
我可以将值另存为np.array
并将其放入数组中,但这很繁琐!
I can save values as np.array
and imput them in array, but that's rediculous!
推荐答案
我使用where
方法,并指定将任何0
替换为np.nan
.一旦将0
指定为NaN
,就可以使用fillna
方法.通过使用ffill
和bfill
,我们用相应的先前值和后续值填充所有NaN
,将它们相加并除以2.
I use where
method and specify to replace any 0
with np.nan
. Once we have specified 0
to be NaN
we can use fillna
method. By using ffill
and bfill
we fill all NaN
with the corresponding previous and proceeding values, add them, and divide by 2.
df.where(df.replace(to_replace=0, value=np.nan),
other=(df.fillna(method='ffill') + df.fillna(method='bfill'))/2)
Number
Date
2012-01-31 00:00:00 676.0
2012-02-29 00:00:00 673.0
2012-03-31 00:00:00 670.0
2012-04-30 00:00:00 669.0
2012-05-31 00:00:00 668.0
这篇关于在列中输入缺失值和最近邻居的平均值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文