使用 dplyr 删除所有变量都是 NA 的行 [英] Remove rows where all variables are NA using dplyr

查看：31 发布时间：2021/12/23 12:28:07 r dplyr tidyverse

本文介绍了使用 dplyr 删除所有变量都是 NA 的行的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在执行一个看似简单的任务时遇到了一些问题:使用 dplyr 删除 all 变量为 NA 的所有行.我知道它可以使用 base R 来完成(删除行在 R 矩阵中，其中所有数据都是 NA 和删除R 中数据文件的空行)，但我很想知道是否有使用 dplyr 的简单方法.

I'm having some issues with a seemingly simple task: to remove all rows where all variables are NA using dplyr. I know it can be done using base R (Remove rows in R matrix where all data is NA and Removing empty rows of a data file in R), but I'm curious to know if there is a simple way of doing it using dplyr.

示例:

library(tidyverse)
dat <- tibble(a = c(1, 2, NA), b = c(1, NA, NA), c = c(2, NA, NA))
filter(dat, !is.na(a) | !is.na(b) | !is.na(c))

上面的 filter 调用做了我想要的，但在我面临的情况下是不可行的(因为有大量的变量).我想可以通过使用 filter_ 并首先使用(长)逻辑语句创建一个字符串来做到这一点，但似乎应该有一种更简单的方法.

The filter call above does what I want but it's infeasible in the situation I'm facing (as there is a large number of variables). I guess one could do it by using filter_ and first creating a string with the (long) logical statement, but it seems like there should be a simpler way.

另一种方法是使用 rowwise() 和 do():

Another way is to use rowwise() and do():

na <- dat %>% 
  rowwise() %>% 
  do(tibble(na = !all(is.na(.)))) %>% 
  .$na
filter(dat, na)

但这看起来不太好，尽管它完成了工作.其他想法?

but that does not look too nice, although it gets the job done. Other ideas?

推荐答案

自从 dplyr 0.7.0 新的范围过滤动词存在.使用 filter_any，您可以轻松过滤包含至少一个非缺失列的行:

Since dplyr 0.7.0 new, scoped filtering verbs exists. Using filter_any you can easily filter rows with at least one non-missing column:

# dplyr 0.7.0
dat %>% filter_all(any_vars(!is.na(.)))

使用@hejseb 基准测试算法，该解决方案似乎与 f4 一样有效.

Using @hejseb benchmarking algorithm it appears that this solution is as efficient as f4.

更新:

自 dplyr 1.0.0 起，上述范围动词已被取代.相反，引入了跨函数系列，它允许在多个(或所有)列上执行一个函数.过滤至少一列不是 NA 的行现在看起来像这样:

Since dplyr 1.0.0 the above scoped verbs are superseded. Instead the across function family was introduced, which allows to perform a function on multiple (or all) columns. Filtering rows with at least one column being not NA looks now like this:

# dplyr 1.0.0
dat %>% filter(if_any(everything(), ~ !is.na(.)))

这篇关于使用 dplyr 删除所有变量都是 NA 的行的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用 dplyr 删除所有变量都是 NA 的行 [英] Remove rows where all variables are NA using dplyr

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

使用 dplyr 删除所有变量都是 NA 的行 [英] Remove rows where all variables are NA using dplyr

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭