插值缺失值时要牢记循环趋势 [英] Imputing missing values keeping circular trend in mind

查看:39
本文介绍了插值缺失值时要牢记循环趋势的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

想象一下日出的图片,其中一个红色的圆圈被黄色的厚环包围,然后被蓝色的背景包围.将红色设为3,然后将黄色设为2,将蓝色设为1.

Think of a picture of Sunrise where a red circle is surrounded by yellow thick ring and then blue background. Take red as 3 then yellow as 2 and blue as 1.

 11111111111
 11111211111
 11112221111
 11222322211
 22223332222
 11222322221
 11112221111
 11111211111

这是所需的输出.但是,记录/文件/数据缺少值(所有元素中有30%丢失).

This is the desired output. But, the record/file/data has missing values (30% of all elements are missing).

我们如何估算缺失值,以便在牢记循环趋势的情况下获得所需的输出.

How can we impute missing values so as to get this desired output keeping the circular trend in mind.

推荐答案

这就是我将以非常简单,直接的方式解决此类问题的方法.请注意,我已将您上面的示例数据更正为对称:

This is how I would solve a problem of this sort in a very simple, straightforward way. Please note that I corrected your sample data above to be symmetric:

d <- read.csv(header=F, stringsAsFactors=F, text="
1,1,1,1,1,1,1,1,1,1,1
1,1,1,1,1,2,1,1,1,1,1
1,1,1,1,2,2,2,1,1,1,1
1,1,2,2,2,3,2,2,2,1,1
2,2,2,2,3,3,3,2,2,2,2
1,1,2,2,2,3,2,2,2,1,1
1,1,1,1,2,2,2,1,1,1,1
1,1,1,1,1,2,1,1,1,1,1
")

library(raster)

##  Plot original data as raster:
d <- raster(as.matrix(d))
plot(d, col=colorRampPalette(c("blue","yellow","red"))(255))

##  Simulate 30% missing data:
d_m <- d
d_m[ sample(1:length(d), length(d)/3) ] <- NA
plot(d_m, col=colorRampPalette(c("blue","yellow","red"))(255))

##  Construct a 3x3 filter for mean filling of missing values:
filter <- matrix(1, nrow=3, ncol=3) 

##  Fill in only missing values with the mean of the values within
##    the 3x3 moving window specified by the filter.  Note that this
##    could be replaced with a median/mode or some other whole-number
##    generating summary statistic:
r <- focal(d_m, filter, mean, na.rm=T, NAonly=T, pad=T)

##  Plot imputed data:
plot(r, col=colorRampPalette(c("blue","yellow","red"))(255), zlim=c(1,3))

这是原始样本数据的图像:

This is an image of the original sample data:

模拟了30%的缺失值:

With 30% missing values simulated:

并且只有那些缺失值与3x3移动窗口的平均值进行插值:

And only those missing values interpolated with the mean of the 3x3 moving window:

这篇关于插值缺失值时要牢记循环趋势的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆