删除 data.frame 中具有全部或部分 NA(缺失值)的行 [英] Remove rows with all or some NAs (missing values) in data.frame
问题描述
我想删除此数据框中的以下行:
I'd like to remove the lines in this data frame that:
a) 在所有列中包含 NA
. 下面是我的示例数据框.
a) contain NA
s across all columns. Below is my example data frame.
gene hsap mmul mmus rnor cfam
1 ENSG00000208234 0 NA NA NA NA
2 ENSG00000199674 0 2 2 2 2
3 ENSG00000221622 0 NA NA NA NA
4 ENSG00000207604 0 NA NA 1 2
5 ENSG00000207431 0 NA NA NA NA
6 ENSG00000221312 0 1 2 3 2
基本上,我想获得如下数据框.
Basically, I'd like to get a data frame such as the following.
gene hsap mmul mmus rnor cfam
2 ENSG00000199674 0 2 2 2 2
6 ENSG00000221312 0 1 2 3 2
b) 仅在某些列中包含 NA
s,所以我也可以得到这个结果:
b) contain NA
s in only some columns, so I can also get this result:
gene hsap mmul mmus rnor cfam
2 ENSG00000199674 0 2 2 2 2
4 ENSG00000207604 0 NA NA 1 2
6 ENSG00000221312 0 1 2 3 2
推荐答案
还要检查complete.cases
:
> final[complete.cases(final), ]
gene hsap mmul mmus rnor cfam
2 ENSG00000199674 0 2 2 2 2
6 ENSG00000221312 0 1 2 3 2
na.omit
更适合删除所有 NA
.complete.cases
允许通过仅包含数据框的某些列来进行部分选择:
na.omit
is nicer for just removing all NA
's. complete.cases
allows partial selection by including only certain columns of the dataframe:
> final[complete.cases(final[ , 5:6]),]
gene hsap mmul mmus rnor cfam
2 ENSG00000199674 0 2 2 2 2
4 ENSG00000207604 0 NA NA 1 2
6 ENSG00000221312 0 1 2 3 2
您的解决方案不起作用.如果你坚持使用 is.na
,那么你必须这样做:
Your solution can't work. If you insist on using is.na
, then you have to do something like:
> final[rowSums(is.na(final[ , 5:6])) == 0, ]
gene hsap mmul mmus rnor cfam
2 ENSG00000199674 0 2 2 2 2
4 ENSG00000207604 0 NA NA 1 2
6 ENSG00000221312 0 1 2 3 2
但使用 complete.cases
会更清晰、更快.
but using complete.cases
is quite a lot more clear, and faster.
这篇关于删除 data.frame 中具有全部或部分 NA(缺失值)的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!