按组均值估算缺失数据 [英] Impute missing data with mean by group

查看：63 发布时间：2020/5/4 4:31:32 r loops missing-data imputation

本文介绍了按组均值估算缺失数据的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个分类变量，具有三个级别(A，B和C).

I have a categorical variable with three levels (A, B, and C).

我还有一个连续变量，上面有一些缺失的值.

I also have a continuous variable with some missing values on it.

我想用组平均值代替NA值.也就是说，必须将A组中缺少的观察结果替换为A组中的平均值.

I would like to replace the NA values with the mean of its group. This is, missing observations from group A has to be replaced with the mean of group A.

我知道我可以计算每个组的均值并替换缺失值，但是我敢肯定还有另一种方法可以通过循环更有效地做到这一点.

I know I can just calculate each group's mean and replace missing values, but I'm sure there's another way to do so more efficiently with loops.

A <- subset(data, group == "A")
mean(A$variable, rm.na = TRUE)
A$variable[which(is.na(A$variable))] <- mean(A$variable, na.rm = TRUE)

现在，我知道我可以对组B和C进行相同的操作，但是也许for循环(带有if和else)可以解决问题?

Now, I understand I could do the same for group B and C, but perhaps a for loop (with if and else) might do the trick?

推荐答案

require(dplyr)
data %>% group_by(group) %>%
mutate(variable=ifelse(is.na(variable),mean(variable,na.rm=TRUE),variable))

对于更快的base-R版本，可以使用ave:

For a faster, base-R version, you can use ave:

data$variable<-ave(data$variable,data$group,FUN=function(x) 
  ifelse(is.na(x), mean(x,na.rm=TRUE), x))

这篇关于按组均值估算缺失数据的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

按组均值估算缺失数据 [英] Impute missing data with mean by group

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

按组均值估算缺失数据 [英] Impute missing data with mean by group

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭