如何用相邻值替换数据帧中的NA(丢失值) [英] How to replace NA (missing values) in a data frame with neighbouring values

查看:100
本文介绍了如何用相邻值替换数据帧中的NA(丢失值)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

862 2006-05-19 6.241603 5.774208     
863 2006-05-20 NA       NA      
864 2006-05-21 NA       NA      
865 2006-05-22 6.383929 5.906426      
866 2006-05-23 6.782068 6.268758      
867 2006-05-24 6.534616 6.013767      
868 2006-05-25 6.370312 5.856366      
869 2006-05-26 6.225175 5.781617      
870 2006-05-27 NA       NA     

我上面有一个带有某些NA的数据框x,我想使用相邻的非NA值进行填充,例如对于2006-05-20,它将是19& 22的平均值

I have a data frame x like above with some NA, which i want to fill using neighboring non-NA values like for 2006-05-20 it will be avg of 19&22

这是怎么回事?

推荐答案

正确格式化您的数据看起来像这样

Properly formatted your data looks like this

862 2006-05-19 6.241603 5.774208 
863 2006-05-20 NA       NA 
864 2006-05-21 NA       NA 
865 2006-05-22 6.383929 5.906426 
866 2006-05-23 6.782068 6.268758 
867 2006-05-24 6.534616 6.013767 
868 2006-05-25 6.370312 5.856366 
869 2006-05-26 6.225175 5.781617 
870 2006-05-27 NA       NA

具有时间序列性质.因此,我将加载到类zoo的对象中(从 zoo 包),这样您就可以选择多种策略-参见下文.您选择哪一个取决于您的数据和应用程序的性质.通常,将填写丢失的数据"字段称为数据插补 并且有相当多的文献.

and is of a time-series nature. So I would load into an object of class zoo (from the zoo package) as that allows you to pick a number of strategies -- see below. Which one you pick depends on the nature of your data and application. In general, the field of 'figuring missing data out' is called data imputation and there is a rather large literature.

R> x <- zoo(X[,3:4], order.by=as.Date(X[,2]))
R> x
               x     y
2006-05-19 6.242 5.774
2006-05-20    NA    NA
2006-05-21    NA    NA
2006-05-22 6.384 5.906
2006-05-23 6.782 6.269
2006-05-24 6.535 6.014
2006-05-25 6.370 5.856
2006-05-26 6.225 5.782
2006-05-27    NA    NA
R> na.locf(x)  # last observation carried forward
               x     y
2006-05-19 6.242 5.774
2006-05-20 6.242 5.774
2006-05-21 6.242 5.774
2006-05-22 6.384 5.906
2006-05-23 6.782 6.269
2006-05-24 6.535 6.014
2006-05-25 6.370 5.856
2006-05-26 6.225 5.782
2006-05-27 6.225 5.782
R> na.approx(x)  # approximation based on before/after values
               x     y
2006-05-19 6.242 5.774
2006-05-20 6.289 5.818
2006-05-21 6.336 5.862
2006-05-22 6.384 5.906
2006-05-23 6.782 6.269
2006-05-24 6.535 6.014
2006-05-25 6.370 5.856
2006-05-26 6.225 5.782
R> na.spline(x)   # spline fit ...
               x     y
2006-05-19 6.242 5.774
2006-05-20 5.585 5.159
2006-05-21 5.797 5.358
2006-05-22 6.384 5.906
2006-05-23 6.782 6.269
2006-05-24 6.535 6.014
2006-05-25 6.370 5.856
2006-05-26 6.225 5.782
2006-05-27 5.973 5.716
R> 

这篇关于如何用相邻值替换数据帧中的NA(丢失值)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆