按组均值估算缺失数据 [英] Impute missing data with mean by group
问题描述
我有一个分类变量,具有三个级别(A
,B
和C
).
I have a categorical variable with three levels (A
, B
, and C
).
我还有一个连续变量,上面有一些缺失的值.
I also have a continuous variable with some missing values on it.
我想用组平均值代替NA
值.也就是说,必须将A
组中缺少的观察结果替换为A
组中的平均值.
I would like to replace the NA
values with the mean of its group. This is, missing observations from group A
has to be replaced with the mean of group A
.
我知道我可以计算每个组的均值并替换缺失值,但是我敢肯定还有另一种方法可以通过循环更有效地做到这一点.
I know I can just calculate each group's mean and replace missing values, but I'm sure there's another way to do so more efficiently with loops.
A <- subset(data, group == "A")
mean(A$variable, rm.na = TRUE)
A$variable[which(is.na(A$variable))] <- mean(A$variable, na.rm = TRUE)
现在,我知道我可以对组B
和C
进行相同的操作,但是也许for
循环(带有if
和else
)可以解决问题?
Now, I understand I could do the same for group B
and C
, but perhaps a for
loop (with if
and else
) might do the trick?
推荐答案
require(dplyr)
data %>% group_by(group) %>%
mutate(variable=ifelse(is.na(variable),mean(variable,na.rm=TRUE),variable))
对于更快的base-R版本,可以使用ave
:
For a faster, base-R version, you can use ave
:
data$variable<-ave(data$variable,data$group,FUN=function(x)
ifelse(is.na(x), mean(x,na.rm=TRUE), x))
这篇关于按组均值估算缺失数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!