在数据框中显示具有NAs的列 [英] Show columns with NAs in a data.frame

查看:120
本文介绍了在数据框中显示具有NAs的列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在大数据帧中显示包含缺失值的列的名称。基本上,我想要相当于complete.cases(df),但是对于列而不是行。一些列是非数字的,所以像

I'd like to show the names of columns in a large dataframe that contain missing values. Basically, I want the equivalent of complete.cases(df) but for columns, not rows. Some of the columns are non-numeric, so something like

names(df[is.na(colMeans(df))])

返回colMeans(df)中的错误:'x'必须是数字。所以,我目前的解决方案是转置数据框并运行complete.cases,但是我猜测应用程序(或者是plyr中的某些东西)的一些变体更有效率。

returns "Error in colMeans(df) : 'x' must be numeric." So, my current solution is to transpose the dataframe and run complete.cases, but I'm guessing there's some variant of apply (or something in plyr) that's much more efficient.

nacols <- function(df) {
  names(df[,!complete.cases(t(df))])
} 

w <- c("hello","goodbye","stuff")
x <- c(1,2,3)
y <- c(1,NA,0)
z <- c(1,0, NA)
tmp <- data.frame(w,x,y,z)

nacols(tmp)
[1] "y" "z"

有人可以告诉我一个更有效的功能来识别具有NAs的列?

Can someone show me a more efficient function to identify columns that have NAs?

推荐答案

这是我知道的最快的方式:

This is the fastest way that I know of:

unlist(lapply(df, function(x) any(is.na(x))))

编辑:

我猜其他人都写完了,完成:

I guess everyone else wrote it out complete so here it is complete:

nacols <- function(df) {
    colnames(df)[unlist(lapply(df, function(x) any(is.na(x))))]
}

如果您在WIN 7机器上对4种解决方案进行了基准测试:

And if you microbenchmark the 4 solutions on a WIN 7 machine:

Unit: microseconds
    expr     min      lq  median      uq        max
1 ANDRIE  85.380  91.911 106.375 116.639    863.124
2 MANOEL  87.712  93.778 105.908 118.971   8426.886
3  MOIRA 764.215 798.273 817.402 876.188 143039.632
4  TYLER  51.321  57.853  62.518  72.316   1365.136

这里是一个视觉效果:

And here's a visual of that:

修改当我写这个 anyNA 不存在,或者我不知道。根据?anyNA 的帮助手册:

Edit At the time I wrote this anyNA did not exist or I was unaware of it. This may speed things up moreso...per the help manual for ?anyNA:


通用函数 anyNA 以更快的方式实现 any(is.na(x))(特别是原子向量)

The generic function anyNA implements any(is.na(x)) in a possibly faster way (especially for atomic vectors).



nacols <- function(df) {
    colnames(df)[unlist(lapply(df, function(x) anyNA(x)))]
}

这篇关于在数据框中显示具有NAs的列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆