如何用相邻值替换数据帧中的NA(丢失值) [英] How to replace NA (missing values) in a data frame with neighbouring values
问题描述
862 2006-05-19 6.241603 5.774208
863 2006-05-20 NA NA
864 2006-05-21 NA NA
865 2006-05-22 6.383929 5.906426
866 2006-05-23 6.782068 6.268758
867 2006-05-24 6.534616 6.013767
868 2006-05-25 6.370312 5.856366
869 2006-05-26 6.225175 5.781617
870 2006-05-27 NA NA
我上面有一个带有某些NA的数据框x,我想使用相邻的非NA值进行填充,例如对于2006-05-20,它将是19& 22的平均值
I have a data frame x like above with some NA, which i want to fill using neighboring non-NA values like for 2006-05-20 it will be avg of 19&22
这是怎么回事?
推荐答案
正确格式化您的数据看起来像这样
Properly formatted your data looks like this
862 2006-05-19 6.241603 5.774208
863 2006-05-20 NA NA
864 2006-05-21 NA NA
865 2006-05-22 6.383929 5.906426
866 2006-05-23 6.782068 6.268758
867 2006-05-24 6.534616 6.013767
868 2006-05-25 6.370312 5.856366
869 2006-05-26 6.225175 5.781617
870 2006-05-27 NA NA
具有时间序列性质.因此,我将加载到类zoo
的对象中(从 zoo 包),这样您就可以选择多种策略-参见下文.您选择哪一个取决于您的数据和应用程序的性质.通常,将填写丢失的数据"字段称为数据插补
并且有相当多的文献.
and is of a time-series nature. So I would load into an object of class zoo
(from the zoo package) as that allows you to pick a number of strategies -- see below. Which one you pick depends on the nature of your data and application. In general, the field of 'figuring missing data out' is called data imputation
and there is a rather large literature.
R> x <- zoo(X[,3:4], order.by=as.Date(X[,2]))
R> x
x y
2006-05-19 6.242 5.774
2006-05-20 NA NA
2006-05-21 NA NA
2006-05-22 6.384 5.906
2006-05-23 6.782 6.269
2006-05-24 6.535 6.014
2006-05-25 6.370 5.856
2006-05-26 6.225 5.782
2006-05-27 NA NA
R> na.locf(x) # last observation carried forward
x y
2006-05-19 6.242 5.774
2006-05-20 6.242 5.774
2006-05-21 6.242 5.774
2006-05-22 6.384 5.906
2006-05-23 6.782 6.269
2006-05-24 6.535 6.014
2006-05-25 6.370 5.856
2006-05-26 6.225 5.782
2006-05-27 6.225 5.782
R> na.approx(x) # approximation based on before/after values
x y
2006-05-19 6.242 5.774
2006-05-20 6.289 5.818
2006-05-21 6.336 5.862
2006-05-22 6.384 5.906
2006-05-23 6.782 6.269
2006-05-24 6.535 6.014
2006-05-25 6.370 5.856
2006-05-26 6.225 5.782
R> na.spline(x) # spline fit ...
x y
2006-05-19 6.242 5.774
2006-05-20 5.585 5.159
2006-05-21 5.797 5.358
2006-05-22 6.384 5.906
2006-05-23 6.782 6.269
2006-05-24 6.535 6.014
2006-05-25 6.370 5.856
2006-05-26 6.225 5.782
2006-05-27 5.973 5.716
R>
这篇关于如何用相邻值替换数据帧中的NA(丢失值)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!