删除缺少x%的列/行 [英] Delete columns/rows with more than x% missing
本文介绍了删除缺少x%的列/行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我要删除数据框中所有 NA
s超过50%的列或行。
I want to delete all columns or rows with more than 50% NA
s in a data frame.
这是我的解决方案:
# delete columns with more than 50% missings
miss <- c()
for(i in 1:ncol(data)) {
if(length(which(is.na(data[,i]))) > 0.5*nrow(data)) miss <- append(miss,i)
}
data2 <- data[,-miss]
# delete rows with more than 50% percent missing
miss2 <- c()
for(i in 1:nrow(data)) {
if(length(which(is.na(data[i,]))) > 0.5*ncol(data)) miss2 <- append(miss2,i)
}
data <- data[-miss,]
,但是我正在寻找更好/更快的解决方案。
but I'm looking for a nicer/faster solution.
我也希望得到 dplyr
解决方案
推荐答案
要删除具有一定数量NA的列,可以使用
colMeans(is.na(...))
To remove columns with some amount of NA, you can use
colMeans(is.na(...))
## Some sample data
set.seed(0)
dat <- matrix(1:100, 10, 10)
dat[sample(1:100, 50)] <- NA
dat <- data.frame(dat)
## Remove columns with more than 50% NA
dat[, which(colMeans(!is.na(dat)) > 0.5)]
## Remove rows with more than 50% NA
dat[which(rowMeans(!is.na(dat)) > 0.5), ]
## Remove columns and rows with more than 50% NA
dat[which(rowMeans(!is.na(dat)) > 0.5), which(colMeans(!is.na(dat)) > 0.5)]
这篇关于删除缺少x%的列/行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文