由于NA的原因,无法对数据帧进行子集(过滤) [英] Unable to subset (filter) a data frame due to NA's

查看:50
本文介绍了由于NA的原因,无法对数据帧进行子集(过滤)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

为什么在dplyr的 filter 下面的代码中没有返回与base R子设置相同的data.frame?

Why in the code below dplyr's filter doesn't return the same data.frame as base R subsetting?

实际上,它们都不符合预期.我想删除同时 b == 1和c == 1 的观察/行.也就是说,我只想删除第三行.

In fact none of them works as expected. I'd like to remove observations/rows which, simultaneously, b==1 AND c==1. That is, I'd like to remove only the third row.

require(dplyr)
df <- data.frame(a=c(0,0,0,0,1,1,1),
  b=c(0,0,1,1,0,0,1),
  c=c(1,NA,1,NA,1,NA,NA))

filter(df, !(b==1 & c==1))

df[!(df$b==1 & df$c==1),]

推荐答案

或使用 complete.cases NA 转换为 FALSE 结果逻辑向量,以便您可以在取反后选择相应的行,这利用了 NA&F = F :

Or use complete.cases to convert NA to FALSE in the result logic vector so that you can pick the corresponding rows up after the negation, and this uses the fact that NA & F = F:

filter(df, !(b == 1 & c == 1 & complete.cases(df[c('b', 'c')])))

#   a b  c
# 1 0 0  1
# 2 0 0 NA
# 3 0 1 NA
# 4 1 0  1
# 5 1 0 NA
# 6 1 1 NA

这里涉及到更多带有 NA 的逻辑运算,乍一看有点令人困惑,但它们遵循逻辑:

More logical operations with NA involved here, which is a little bit confusing at the first glance but they are following the logic:

NA & F
# [1] FALSE
NA | T
# [1] TRUE
NA & T
# [1] NA
NA | F
# [1] NA

这篇关于由于NA的原因,无法对数据帧进行子集(过滤)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆