使用线性近似法估算NA观测值的边界 [英] Imputation for bounding NA observations using a linear approximation
问题描述
我想使用以下两个非NA观测值的线性近似值,在阵列的开头为NA观测值插值,以推断缺失值.然后使用前面的两个非NA观测值对阵列末尾的NA观测值执行相同的操作.
I would like to impute values for NA observations at the beginning of the array, using a linear approximation of the following two non-NA observations to extrapolate the missing value. Then do the same for the NA observations at the end of the array, using the preceding two non-NA observations.
我的df的可复制示例:
A reproducible example of my df:
M=matrix(sample(1:9,10*10,T),10);M[sample(1:length(M),0.5*length(M),F)]=NA;dimnames(M)=list(paste(rep("City",dim(M)[1]),1:dim(M)[1],sep=""),paste(rep("Year",dim(M)[2]),1:dim(M)[2],sep=""))
M
Year1 Year2 Year3 Year4 Year5 Year6 Year7 Year8 Year9 Year10
City1 NA 4 5 NA 3 NA NA NA 5 NA
City2 6 NA 3 3 NA 4 6 NA NA 7
City3 NA 7 NA 8 8 NA NA 8 NA 5
City4 3 5 3 NA NA 3 5 9 8 7
City5 4 6 6 NA NA 8 NA 7 1 NA
City6 NA NA NA NA 4 NA 8 3 6 7
City7 9 3 NA NA NA NA NA 4 NA NA
City8 5 6 9 8 5 NA NA 1 4 NA
City9 NA NA 6 NA 3 3 8 NA 7 NA
City10 NA NA NA NA NA NA NA NA NA 1
idx=rowSums(!is.na(M))>=2 # Index of rows with 2 or more non-NA to run na.approx
library(zoo)
M[idx,]=t(na.approx(t(M[idx,]),rule=1,method="linear")) # I'm using t as na.approx works on columns
Year1 Year2 Year3 Year4 Year5 Year6 Year7 Year8 Year9 Year10
City1 NA 4.0 5 4.0 3.000000 3.50 4.0 4.5 5 NA
City2 6.0 5.5 3 3.0 5.500000 4.00 6.0 6.0 6 7
City3 4.5 7.0 3 8.0 8.000000 3.50 5.5 8.0 7 5
City4 3.0 5.0 3 8.0 6.666667 3.00 5.0 9.0 8 7
City5 4.0 6.0 6 8.0 5.333333 8.00 6.5 7.0 1 7
City6 6.5 4.5 7 8.0 4.000000 6.75 8.0 3.0 6 7
City7 9.0 3.0 8 8.0 4.500000 5.50 8.0 4.0 5 NA
City8 5.0 6.0 9 8.0 5.000000 4.25 8.0 1.0 4 NA
City9 NA NA 6 4.5 3.000000 3.00 8.0 7.5 7 NA
City10 NA NA NA NA NA NA NA NA NA 1
我想基于两个先前/以下观察值,使用线性近似来推断边界(对于City1
和City9
).例如,M[1,1]
应该为3
,而M[1,10]
应该为5,5
.
I would like to extrapolate the boundaries (for City1
and City9
) using a linear approximation based on the two preceding/following observations. For example M[1,1]
should be 3
and M[1,10]
should be 5,5
.
你知道我该怎么做吗?
推荐答案
这将为您提供第一列,其中填充了NA
的线性外推值.您可以改写最后一栏.
This gives you the first column with the linearly-extrapolated values filled in for NA
. You can adapt for the last column.
firstNAfill <- function(x) {
ans <- ifelse(!is.na(x[1]),
x[1],
ifelse(sum(!is.na(x))<2, NA,
2*x[which(!is.na(x[1, ]))[1]] - x[which(!is.na(x[1, ]))[2]]
)
)
return(ans)
}
dat$Year1 <- unlist(lapply(seq(1:nrow(dat)), function(x) {firstNAfill(dat[x, ])}))
结果:
Year1 Year2 Year3 Year4 Year5 Year6 Year7 Year8 Year9 Year10
City1 3.0 4.0 5 4.0 3.000000 3.50 4.0 4.5 5 NA
City2 6.0 5.5 3 3.0 5.500000 4.00 6.0 6.0 6 7
City3 4.5 7.0 3 8.0 8.000000 3.50 5.5 8.0 7 5
City4 3.0 5.0 3 8.0 6.666667 3.00 5.0 9.0 8 7
City5 4.0 6.0 6 8.0 5.333333 8.00 6.5 7.0 1 7
City6 6.5 4.5 7 8.0 4.000000 6.75 8.0 3.0 6 7
City7 9.0 3.0 8 8.0 4.500000 5.50 8.0 4.0 5 NA
City8 5.0 6.0 9 8.0 5.000000 4.25 8.0 1.0 4 NA
City9 7.5 NA 6 4.5 3.000000 3.00 8.0 7.5 7 NA
City10 NA NA NA NA NA NA NA NA NA 1
如果不是NA
,则该函数返回第一列的当前值;如果没有两个值可以从中推断,则返回NA
;否则,则返回该推断值.
The function returns the first column's current value if not NA
, NA
if there aren't two values to extrapolate from, and the extrapolated value otherwise.
这篇关于使用线性近似法估算NA观测值的边界的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!