将数据框与缺失值对齐 [英] Aligning Data frame with missing values
本文介绍了将数据框与缺失值对齐的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在使用具有许多NA
值的数据框.虽然我可以创建线性模型,但由于缺少值和缺少指标列,因此我随后无法将模型的拟合值与原始数据对齐.
I'm using a data frame with many NA
values. While I'm able to create a linear model, I am subsequently unable to line the fitted values of the model up with the original data due to the missing values and lack of indicator column.
这是一个可复制的示例:
Here's a reproducible example:
library(MASS)
dat <- Aids2
# Add NA's
dat[floor(runif(100, min = 1, max = nrow(dat))),3] <- NA
# Create a model
model <- lm(death ~ diag + age, data = dat)
# Different Values
length(fitted.values(model))
# 2745
nrow(dat)
# 2843
推荐答案
这里实际上有三种解决方案:
There are actually three solutions here:
- 填充
NA
自己调整值; - 使用
predict()
计算拟合值; - 自己删除不完整的案例,仅将完整的案例传递给
lm()
.
- pad
NA
to fitted values ourselves; - use
predict()
to compute fitted values; - drop incomplete cases ourselves and pass only complete cases to
lm()
.
选项1
## row indicator with `NA`
id <- attr(na.omit(dat), "na.action")
fitted <- rep(NA, nrow(dat))
fitted[-id] <- model$fitted
nrow(dat)
# 2843
length(fitted)
# 2843
sum(!is.na(fitted))
# 2745
选项2
## the default NA action for "predict.lm" is "na.pass"
pred <- predict(model, newdata = dat) ## has to use "newdata = dat" here!
nrow(dat)
# 2843
length(pred)
# 2843
sum(!is.na(pred))
# 2745
选项3
或者,您可以简单地将没有任何NA
的数据帧传递给lm()
:
Alternatively, you might simply pass a data frame without any NA
to lm()
:
complete.dat <- na.omit(dat)
fit <- lm(death ~ diag + age, data = complete.dat)
nrow(complete.dat)
# 2745
length(fit$fitted)
# 2745
sum(!is.na(fit$fitted))
# 2745
总之,
- 选项1 通过填充
NA
来直接进行对齐",但是我认为人们很少采用这种方法; - 选项2 确实很简单,但是计算成本更高;
- 选项3 是我的最爱,因为它使所有事情变得简单.
- Option 1 does the "alignment" in a straightforward manner by padding
NA
, but I think people seldom take this approach; - Option 2 is really simple, but it is more computationally costly;
- Option 3 is my favourite as it keeps all things simple.
这篇关于将数据框与缺失值对齐的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文