在数据框中显示具有NAs的列 [英] Show columns with NAs in a data.frame
问题描述
我想在大数据帧中显示包含缺失值的列的名称。基本上,我想要相当于complete.cases(df),但是对于列而不是行。一些列是非数字的,所以像
I'd like to show the names of columns in a large dataframe that contain missing values. Basically, I want the equivalent of complete.cases(df) but for columns, not rows. Some of the columns are non-numeric, so something like
names(df[is.na(colMeans(df))])
返回colMeans(df)中的错误:'x'必须是数字。所以,我目前的解决方案是转置数据框并运行complete.cases,但是我猜测应用程序(或者是plyr中的某些东西)的一些变体更有效率。
returns "Error in colMeans(df) : 'x' must be numeric." So, my current solution is to transpose the dataframe and run complete.cases, but I'm guessing there's some variant of apply (or something in plyr) that's much more efficient.
nacols <- function(df) {
names(df[,!complete.cases(t(df))])
}
w <- c("hello","goodbye","stuff")
x <- c(1,2,3)
y <- c(1,NA,0)
z <- c(1,0, NA)
tmp <- data.frame(w,x,y,z)
nacols(tmp)
[1] "y" "z"
有人可以告诉我一个更有效的功能来识别具有NAs的列?
Can someone show me a more efficient function to identify columns that have NAs?
推荐答案
这是我知道的最快的方式:
This is the fastest way that I know of:
unlist(lapply(df, function(x) any(is.na(x))))
编辑:
我猜其他人都写完了,完成:
I guess everyone else wrote it out complete so here it is complete:
nacols <- function(df) {
colnames(df)[unlist(lapply(df, function(x) any(is.na(x))))]
}
如果您在WIN 7机器上对4种解决方案进行了基准测试:
And if you microbenchmark the 4 solutions on a WIN 7 machine:
Unit: microseconds
expr min lq median uq max
1 ANDRIE 85.380 91.911 106.375 116.639 863.124
2 MANOEL 87.712 93.778 105.908 118.971 8426.886
3 MOIRA 764.215 798.273 817.402 876.188 143039.632
4 TYLER 51.321 57.853 62.518 72.316 1365.136
这里是一个视觉效果:
And here's a visual of that:
修改当我写这个 anyNA
不存在,或者我不知道。根据?anyNA
的帮助手册:
Edit At the time I wrote this anyNA
did not exist or I was unaware of it. This may speed things up moreso...per the help manual for ?anyNA
:
通用函数
anyNA
以更快的方式实现any(is.na(x))
(特别是原子向量)
The generic function
anyNA
implementsany(is.na(x))
in a possibly faster way (especially for atomic vectors).
nacols <- function(df) {
colnames(df)[unlist(lapply(df, function(x) anyNA(x)))]
}
这篇关于在数据框中显示具有NAs的列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!