为什么dplyr的过滤器会从因子变量中删除NA值? [英] Why does dplyr's filter drop NA values from a factor variable?
问题描述
当我使用dplyr
包中的filter
删除因子变量的级别时,filter
也会删除NA
值.这是一个示例:
When I use filter
from the dplyr
package to drop a level of a factor variable, filter
also drops the NA
values. Here's an example:
library(dplyr)
set.seed(919)
(dat <- data.frame(var1 = factor(sample(c(1:3, NA), size = 10, replace = T))))
# var1
# 1 <NA>
# 2 3
# 3 3
# 4 1
# 5 1
# 6 <NA>
# 7 2
# 8 2
# 9 <NA>
# 10 1
filter(dat, var1 != 1)
# var1
# 1 3
# 2 3
# 3 2
# 4 2
这似乎并不理想-我只想将行放在var1 == 1
处.
This does not seem ideal -- I only wanted to drop rows where var1 == 1
.
这似乎是在发生,因为任何与NA
的比较都会返回NA
,然后该filter
掉落.因此,例如,filter(dat, !(var1 %in% 1))
会产生正确的结果.但是有没有办法告诉filter
不要删除NA
值?
It looks like this is occurring because any comparison with NA
returns NA
, which filter
then drops. So, for example, filter(dat, !(var1 %in% 1))
produces the correct results. But is there a way to tell filter
not to drop the NA
values?
推荐答案
您可以使用:
filter(dat, var1 != 1 | is.na(var1))
var1
1 <NA>
2 3
3 3
4 <NA>
5 2
6 2
7 <NA>
不会.
为了完整起见,删除NA也是filter
的预期行为,如以下所示:
Also just for completion, dropping NAs is the intended behavior of filter
as you can see from the following:
test_that("filter discards NA", {
temp <- data.frame(
i = 1:5,
x = c(NA, 1L, 1L, 0L, 0L)
)
res <- filter(temp, x == 1)
expect_equal(nrow(res), 2L)
})
以上测试取自 查看全文