使用线性近似法估算NA观测值的边界 [英] Imputation for bounding NA observations using a linear approximation

查看:92
本文介绍了使用线性近似法估算NA观测值的边界的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用以下两个非NA观测值的线性近似值,在阵列的开头为NA观测值插值,以推断缺失值.然后使用前面的两个非NA观测值对阵列末尾的NA观测值执行相同的操作.

I would like to impute values for NA observations at the beginning of the array, using a linear approximation of the following two non-NA observations to extrapolate the missing value. Then do the same for the NA observations at the end of the array, using the preceding two non-NA observations.

我的df的可复制示例:

A reproducible example of my df:

M=matrix(sample(1:9,10*10,T),10);M[sample(1:length(M),0.5*length(M),F)]=NA;dimnames(M)=list(paste(rep("City",dim(M)[1]),1:dim(M)[1],sep=""),paste(rep("Year",dim(M)[2]),1:dim(M)[2],sep=""))
    M

       Year1 Year2 Year3 Year4 Year5 Year6 Year7 Year8 Year9 Year10
City1     NA     4     5    NA     3    NA    NA    NA     5     NA
City2      6    NA     3     3    NA     4     6    NA    NA      7
City3     NA     7    NA     8     8    NA    NA     8    NA      5
City4      3     5     3    NA    NA     3     5     9     8      7
City5      4     6     6    NA    NA     8    NA     7     1     NA
City6     NA    NA    NA    NA     4    NA     8     3     6      7
City7      9     3    NA    NA    NA    NA    NA     4    NA     NA
City8      5     6     9     8     5    NA    NA     1     4     NA
City9     NA    NA     6    NA     3     3     8    NA     7     NA
City10    NA    NA    NA    NA    NA    NA    NA    NA    NA      1

idx=rowSums(!is.na(M))>=2 # Index of rows with 2 or more non-NA to run na.approx

library(zoo)
M[idx,]=t(na.approx(t(M[idx,]),rule=1,method="linear")) # I'm using t as na.approx works on columns

       Year1 Year2 Year3 Year4    Year5 Year6 Year7 Year8 Year9 Year10
City1     NA   4.0     5   4.0 3.000000  3.50   4.0   4.5     5     NA
City2    6.0   5.5     3   3.0 5.500000  4.00   6.0   6.0     6      7
City3    4.5   7.0     3   8.0 8.000000  3.50   5.5   8.0     7      5
City4    3.0   5.0     3   8.0 6.666667  3.00   5.0   9.0     8      7
City5    4.0   6.0     6   8.0 5.333333  8.00   6.5   7.0     1      7
City6    6.5   4.5     7   8.0 4.000000  6.75   8.0   3.0     6      7
City7    9.0   3.0     8   8.0 4.500000  5.50   8.0   4.0     5     NA
City8    5.0   6.0     9   8.0 5.000000  4.25   8.0   1.0     4     NA
City9     NA    NA     6   4.5 3.000000  3.00   8.0   7.5     7     NA
City10    NA    NA    NA    NA       NA    NA    NA    NA    NA      1

我想基于两个先前/以下观察值,使用线性近似来推断边界(对于City1City9).例如,M[1,1]应该为3,而M[1,10]应该为5,5.

I would like to extrapolate the boundaries (for City1 and City9) using a linear approximation based on the two preceding/following observations. For example M[1,1] should be 3 and M[1,10] should be 5,5.

你知道我该怎么做吗?

推荐答案

这将为您提供第一列,其中填充了NA的线性外推值.您可以改写最后一栏.

This gives you the first column with the linearly-extrapolated values filled in for NA. You can adapt for the last column.

firstNAfill <- function(x) {
  ans <- ifelse(!is.na(x[1]),
                x[1],
                ifelse(sum(!is.na(x))<2, NA,
                       2*x[which(!is.na(x[1, ]))[1]] - x[which(!is.na(x[1, ]))[2]]
                )
  )
  return(ans)
}


dat$Year1 <- unlist(lapply(seq(1:nrow(dat)), function(x) {firstNAfill(dat[x, ])}))

结果:

       Year1 Year2 Year3 Year4    Year5 Year6 Year7 Year8 Year9 Year10
City1    3.0   4.0     5   4.0 3.000000  3.50   4.0   4.5     5     NA
City2    6.0   5.5     3   3.0 5.500000  4.00   6.0   6.0     6      7
City3    4.5   7.0     3   8.0 8.000000  3.50   5.5   8.0     7      5
City4    3.0   5.0     3   8.0 6.666667  3.00   5.0   9.0     8      7
City5    4.0   6.0     6   8.0 5.333333  8.00   6.5   7.0     1      7
City6    6.5   4.5     7   8.0 4.000000  6.75   8.0   3.0     6      7
City7    9.0   3.0     8   8.0 4.500000  5.50   8.0   4.0     5     NA
City8    5.0   6.0     9   8.0 5.000000  4.25   8.0   1.0     4     NA
City9    7.5    NA     6   4.5 3.000000  3.00   8.0   7.5     7     NA
City10    NA    NA    NA    NA       NA    NA    NA    NA    NA      1

如果不是NA,则该函数返回第一列的当前值;如果没有两个值可以从中推断,则返回NA;否则,则返回该推断值.

The function returns the first column's current value if not NA, NA if there aren't two values to extrapolate from, and the extrapolated value otherwise.

这篇关于使用线性近似法估算NA观测值的边界的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆