在数据帧的每一列中填写NA的平均值 [英] Fill in mean values for NA in every column of a data frame

查看:177
本文介绍了在数据帧的每一列中填写NA的平均值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果我有数据帧df

df=data.frame(x=1:20,y=c(1:10,rep(NA,10)),z=c(rep(NA,5),1:15))

我知道用给定列的平均值替换NA是,我们可以使用

I know to replace NAs with mean value for a given column is, we can use

df[is.na(df$x)]=mean(df$x,na.rm=T)

我想找到的是一种使用单个命令的方法,这样它可以立即对列执行此操作,而不是对每个列重复执行此操作.

What I am trying to find is a way to use a single command so that it does this for the columns at once instead of repeating it for every column.

怀疑,我需要使用sapply和function,我尝试过类似的操作,但显然这不起作用

Suspecting, I need to use sapply and function, I tried something like this but clearly this does not work

sapply(df,function(x) df[is.na(df$x)]=mean(df$x,na.rm=T))

任何建议都会很棒.我尝试搜索以前的帖子,但找不到类似的问题正在解决.

Any suggestions would be great. I tried to search previous post but I could not find a similar problem being addressed.

推荐答案

我们可以使用na.aggregate.一种选择是将na.aggregate分别应用于每列.我们可以使用lapply做到这一点.如果我们使用的是data.table,则将'data.frame'转换为'data.table'(setDT(df)),循环遍历各列并应用na.aggregate.这将用非NA值的平均值代替NA.

We can use na.aggregate. One option would be to separately apply the na.aggregate on each column. We can do this with lapply. If we are using data.table, convert the 'data.frame' to 'data.table' (setDT(df)), loop over the columns and apply na.aggregate. This will replace NA with the mean of the non-NA values.

library(zoo)
library(data.table)
setDT(df)[, names(df) := lapply(.SD, na.aggregate)][]
#     x    y  z
# 1:  1  1.0  8
# 2:  2  2.0  8
# 3:  3  3.0  8
# 4:  4  4.0  8
# 5:  5  5.0  8
# 6:  6  6.0  1
# 7:  7  7.0  2
# 8:  8  8.0  3
# 9:  9  9.0  4
#10: 10 10.0  5
#11: 11  5.5  6
#12: 12  5.5  7
#13: 13  5.5  8
#14: 14  5.5  9
#15: 15  5.5 10
#16: 16  5.5 11
#17: 17  5.5 12
#18: 18  5.5 13
#19: 19  5.5 14
#20: 20  5.5 15


或者我们可以直接在数据集上使用na.aggregate.

na.aggregate(df)

这篇关于在数据帧的每一列中填写NA的平均值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆