如果列包含NA,如何子集data.frame [英] How to subset a data.frame if the column contains NAs
问题描述
R(版本3.3.3)在基于字符列的条件下为数据框设置子集时给我一些意外的行为。这是一个示例:
R (version 3.3.3) is giving me some unexpected behavior when subsetting a data frame on a condition based on a character column. Here is an example:
foo <- data.frame(bar = c('a',NA,'b','a'),
baz = 1:4,
stringsAsFactors = FALSE)
foo
看起来像这样:
bar baz
1 a 1
2 <NA> 2
3 b 3
4 a 4
我想获取所有行该数据帧的位置,其中 bar!= a
,所以我称:
I want to get all rows of this data frame where bar != "a"
, so I call:
foo[foo$bar != 'a', ]
这将返回:
bar baz
NA <NA> NA
3 b 3
我不明白为什么第二栏中的第一项是 NA
而不是 2
。请帮我解释一下这种奇怪的行为。
I do not understand why the first entry in the second column is NA
and not 2
. Please help me explain this strange behavior.
推荐答案
虽然我试图了解这种行为,但正确/更好的方法R中的字符过滤器将使用%in%
运算符。
While I'm trying to understand the behaviour, the right/better way to do character filter in R is to use %in%
operator.
foo <- data.frame(bar = c('a',NA,'b','a'),
baz = 1:4,
stringsAsFactors = FALSE)
foo[!(foo$bar %in% 'a'), ]
输出:
> foo[!(foo$bar %in% 'a'), ]
bar baz
2 <NA> 2
3 b 3
更新:
该行为不是由于字符过滤器引起的。这实际上是因为 NA
用于索引数据框。
The behaviour isn't because of character filter. It's actually because NA
is used to index the dataframe.
> foo[c(F,NA,T,F),]
bar baz
NA <NA> NA
3 b 3
通过 NA
作为索引值用 NA
> foo[NA,]
bar baz
NA <NA> NA
NA.1 <NA> NA
NA.2 <NA> NA
NA.3 <NA> NA
> foo[c(T,NA),]
bar baz
1 a 1
NA <NA> NA
3 b 3
NA.1 <NA> NA
这篇关于如果列包含NA,如何子集data.frame的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!