删除除2列之外所有列均为NA的行 [英] remove rows where all columns are NA except 2 columns

查看:145
本文介绍了删除除2列之外所有列均为NA的行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 data.table 。我要删除那些除某些2列之外的所有列均为NA的行。例如:

I have a data.table. I want to remove those rows where all columns except certain 2 columns are NA. For example:

我有一个数据表,例如:

I have a data.table like:

> ww2
    Sepal.Length Sepal.Width Petal.Length Petal.Width Species index
 1:          5.1         3.5          1.4         0.2  setosa     1
 2:          4.9         3.0          1.4         0.2  setosa     2
 3:          4.7         3.2          1.3         0.2  setosa     3
 4:          4.6         3.1          1.5         0.2  setosa     4
 5:          5.0         3.6          1.4         0.2  setosa     5
 6:          5.1         3.5          1.4         0.2 dffdsdf     1
 7:          4.9         3.0          1.4         0.2 dffdsdf     2
 8:          4.7         3.2          1.3         0.2 dffdsdf     3
 9:           NA          NA           NA          NA dffdsdf     4
10:           NA          NA           NA          NA dffdsdf     5

其输出为:

    structure(list(Sepal.Length = c(5.1, 4.9, 4.7, 4.6, 5, 5.1, 4.9, 
4.7, NA, NA), Sepal.Width = c(3.5, 3, 3.2, 3.1, 3.6, 3.5, 3, 
3.2, NA, NA), Petal.Length = c(1.4, 1.4, 1.3, 1.5, 1.4, 1.4, 
1.4, 1.3, NA, NA), Petal.Width = c(0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 
0.2, 0.2, NA, NA), Species = structure(c(1L, 1L, 1L, 1L, 1L, 
4L, 4L, 4L, 4L, 4L), class = "factor", .Label = c("setosa", "versicolor", 
"virginica", "dffdsdf")), index = c(1L, 2L, 3L, 4L, 5L, 1L, 2L, 
3L, 4L, 5L)), .Names = c("Sepal.Length", "Sepal.Width", "Petal.Length", 
"Petal.Width", "Species", "index"), row.names = c(NA, -10L), class = "data.frame")

在上面的数据表中,我想删除第9行和第10行。由于我的实际数据表很大,并且有很多列,因此很难明确提到那些不适用的列。但是不是NA的列是固定的(它们是2,在此特定示例中,它们是 index Species

In above data table I want to remove row number 9 and 10. Since my actual data table is really big and has a lot more columns, it is difficult to explicitly mention those columns which are NA. But the columns which are not NA are fixed (they are 2, and in this particular example they are index and Species.

我正在寻找一种有效且快速的解决方案。

I am looking for an efficient and fast solution to this.

推荐答案

鉴于您提供的数据,我将执行以下操作:

Given the data you provided, I would do something like:

library(dplyr)
na_rows = ww2 %>% 
            select(-Species, -index) %>% 
            is.na() %>% 
            rowSums() > 0

ww2 %>% 
  filter(!na_rows)

  Sepal.Length Sepal.Width Petal.Length Petal.Width Species index
1          5.1         3.5          1.4         0.2  setosa     1
2          4.9         3.0          1.4         0.2  setosa     2
3          4.7         3.2          1.3         0.2  setosa     3
4          4.6         3.1          1.5         0.2  setosa     4
5          5.0         3.6          1.4         0.2  setosa     5
6          5.1         3.5          1.4         0.2 dffdsdf     1
7          4.9         3.0          1.4         0.2 dffdsdf     2
8          4.7         3.2          1.3         0.2 dffdsdf     3

或更多默认R风格(我喜欢 dplyr ):

or more default R style (I like dplyr):

na_rows = rowSums(is.na(ww2[, .SD, .SDcols = !c('Species', 'index')]), with = FALSE])) > 0
ww2[!na_rows,]

这篇关于删除除2列之外所有列均为NA的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆