如何用R中的均值替换所有NA? [英] How do I replace all NA with mean in R?
问题描述
我的数据集中有1500多个列,其中100多个包含至少一个NA.我知道我可以用单列代替NAs
I have over 1500 columns in my dataset and 100+ of them contains at least one NA. I know I can replace NAs in a single column by
d$var[is.na(d$var)] <- mean(d$var, na.rm=TRUE)
但是我也该怎么办数据集中的所有NA?
but how do I do this too ALL the NAs in my dataset?
谢谢!
推荐答案
我们可以使用zoo
中的na.aggregate
.循环浏览数据集的列(假设所有列均为numeric
),应用na.aggregate
将NA替换为mean
值(默认情况下),并将其分配回数据集.
We can use na.aggregate
from zoo
. Loop through the columns of dataset (assuming all the columns are numeric
), apply the na.aggregate
to replace the NA with mean
values (by default) and assign it back to the dataset.
library(zoo)
df[] <- lapply(df, na.aggregate)
默认情况下,na.aggregate
的FUN
自变量为mean
:
By default, the FUN
argument of na.aggregate
is mean
:
默认的S3方法:
Default S3 method:
na.aggregate(object,by = 1,...,FUN =平均值, na.rm = FALSE,maxgap = Inf)
na.aggregate(object, by = 1, ..., FUN = mean, na.rm = FALSE, maxgap = Inf)
要无损地执行此操作:
df2 <- df
df2[] <- lapply(df2, na.aggregate)
或一行:
df2 <- replace(df, TRUE, lapply(df, na.aggregate))
如果存在非数字列,请仅通过首先创建逻辑索引来对数字列执行此操作
If there are non-numeric columns, do this only for the numeric columns by creating a logical index first
ok <- sapply(df, is.numeric)
df[ok] <- lapply(df[ok], na.aggregate)
这篇关于如何用R中的均值替换所有NA?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!