在data.frame中删除具有NAs(缺少值)的行 [英] Remove rows with NAs (missing values) in data.frame
问题描述
我想删除所有列中包含 NA
的数据框中的行。下面是我的示例数据框。
I'd like to remove the lines in this data frame that contain NA
s across all columns. Below is my example data frame.
gene hsap mmul mmus rnor cfam
1 ENSG00000208234 0 NA NA NA NA
2 ENSG00000199674 0 2 2 2 2
3 ENSG00000221622 0 NA NA NA NA
4 ENSG00000207604 0 NA NA 1 2
5 ENSG00000207431 0 NA NA NA NA
6 ENSG00000221312 0 1 2 3 2
基本上,我想得到一个数据框架,如下所示。
Basically, I'd like to get a data frame such as the following.
gene hsap mmul mmus rnor cfam
2 ENSG00000199674 0 2 2 2 2
6 ENSG00000221312 0 1 2 3 2
此外,我想知道如何只对一些列进行过滤,所以我也可以得到如下的数据框: / p>
Also, I'd like to know how to only filter for some columns, so I can also get a data frame like this:
gene hsap mmul mmus rnor cfam
2 ENSG00000199674 0 2 2 2 2
4 ENSG00000207604 0 NA NA 1 2
6 ENSG00000221312 0 1 2 3 2
推荐答案
还要检查 complete.cases
:
> final[complete.cases(final),]
gene hsap mmul mmus rnor cfam
2 ENSG00000199674 0 2 2 2 2
6 ENSG00000221312 0 1 2 3 2
na.omit
更好的是删除所有NA。 complete.cases
允许通过使用部分数据框进行部分选择:
na.omit
is nicer for just removing all NA's. complete.cases
allows partial selection by using part of the dataframe :
> final[complete.cases(final[,5:6]),]
gene hsap mmul mmus rnor cfam
2 ENSG00000199674 0 2 2 2 2
4 ENSG00000207604 0 NA NA 1 2
6 ENSG00000221312 0 1 2 3 2
您的解决方案无法正常工作。如果您坚持使用is.na,则必须执行以下操作:
Your solution can't work. If you insist on using is.na, then you have to do something like:
> final[rowSums(is.na(final[,5:6]))==0,]
gene hsap mmul mmus rnor cfam
2 ENSG00000199674 0 2 2 2 2
4 ENSG00000207604 0 NA NA 1 2
6 ENSG00000221312 0 1 2 3 2
但使用complete.cases相当清楚,更快。
but using complete.cases is quite a lot more clear, and faster.
这篇关于在data.frame中删除具有NAs(缺少值)的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!