删除除2列之外所有列均为NA的行 [英] remove rows where all columns are NA except 2 columns
问题描述
我有一个 data.table
。我要删除那些除某些2列之外的所有列均为NA的行。例如:
I have a data.table
. I want to remove those rows where all columns except certain 2 columns are NA. For example:
我有一个数据表,例如:
I have a data.table like:
> ww2
Sepal.Length Sepal.Width Petal.Length Petal.Width Species index
1: 5.1 3.5 1.4 0.2 setosa 1
2: 4.9 3.0 1.4 0.2 setosa 2
3: 4.7 3.2 1.3 0.2 setosa 3
4: 4.6 3.1 1.5 0.2 setosa 4
5: 5.0 3.6 1.4 0.2 setosa 5
6: 5.1 3.5 1.4 0.2 dffdsdf 1
7: 4.9 3.0 1.4 0.2 dffdsdf 2
8: 4.7 3.2 1.3 0.2 dffdsdf 3
9: NA NA NA NA dffdsdf 4
10: NA NA NA NA dffdsdf 5
其输出为:
structure(list(Sepal.Length = c(5.1, 4.9, 4.7, 4.6, 5, 5.1, 4.9,
4.7, NA, NA), Sepal.Width = c(3.5, 3, 3.2, 3.1, 3.6, 3.5, 3,
3.2, NA, NA), Petal.Length = c(1.4, 1.4, 1.3, 1.5, 1.4, 1.4,
1.4, 1.3, NA, NA), Petal.Width = c(0.2, 0.2, 0.2, 0.2, 0.2, 0.2,
0.2, 0.2, NA, NA), Species = structure(c(1L, 1L, 1L, 1L, 1L,
4L, 4L, 4L, 4L, 4L), class = "factor", .Label = c("setosa", "versicolor",
"virginica", "dffdsdf")), index = c(1L, 2L, 3L, 4L, 5L, 1L, 2L,
3L, 4L, 5L)), .Names = c("Sepal.Length", "Sepal.Width", "Petal.Length",
"Petal.Width", "Species", "index"), row.names = c(NA, -10L), class = "data.frame")
在上面的数据表中,我想删除第9行和第10行。由于我的实际数据表很大,并且有很多列,因此很难明确提到那些不适用的列。但是不是NA的列是固定的(它们是2,在此特定示例中,它们是 index
和 Species
In above data table I want to remove row number 9 and 10. Since my actual data table is really big and has a lot more columns, it is difficult to explicitly mention those columns which are NA. But the columns which are not NA are fixed (they are 2, and in this particular example they are index
and Species
.
我正在寻找一种有效且快速的解决方案。
I am looking for an efficient and fast solution to this.
推荐答案
鉴于您提供的数据,我将执行以下操作:
Given the data you provided, I would do something like:
library(dplyr)
na_rows = ww2 %>%
select(-Species, -index) %>%
is.na() %>%
rowSums() > 0
ww2 %>%
filter(!na_rows)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species index
1 5.1 3.5 1.4 0.2 setosa 1
2 4.9 3.0 1.4 0.2 setosa 2
3 4.7 3.2 1.3 0.2 setosa 3
4 4.6 3.1 1.5 0.2 setosa 4
5 5.0 3.6 1.4 0.2 setosa 5
6 5.1 3.5 1.4 0.2 dffdsdf 1
7 4.9 3.0 1.4 0.2 dffdsdf 2
8 4.7 3.2 1.3 0.2 dffdsdf 3
或更多默认R风格(我喜欢 dplyr
):
or more default R style (I like dplyr
):
na_rows = rowSums(is.na(ww2[, .SD, .SDcols = !c('Species', 'index')]), with = FALSE])) > 0
ww2[!na_rows,]
这篇关于删除除2列之外所有列均为NA的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!