NA在`i`表达的data.table(可能的bug) [英] NA in `i` expression of data.table (possible bug)
问题描述
当 i
包含 NA
时,不返回该特定行。我不知道这是预期的行为还是它?
When i
contains NA
, that particular row is not returned. I am not sure this is the intended behavior or is it?
require(data.table)
x = data.table(a=c(NA, 1:3, NA))
x[a>0]
a
1: 1
2: 2
3: 3
x[!(a>0)]
a
1: NA
2: NA
x[a<0]
Empty data.table (0 rows) of 1 col: a
x[!(a<0)]
a
1: NA
2: 1
3: 2
4: 3
5: NA
> sessionInfo()
R version 2.15.2 (2012-10-26)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=C LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] data.table_1.8.8
推荐答案
正如@flodel指出的,问题可以简化为为什么这不是 TRUE
:
As @flodel points out, the question can be simplified to, Why is this not TRUE
:
identical(x[as.logical(a)], x[!!as.logical(a)]) # note the double bangs
答案在于data.table如何处理 i
中的 NA
以及如何处理 !
在 i
中。两者都接受特殊待遇。问题真的出现在两者的结合。
The answer lies in how data.table handles NA
in i
and how it handles !
in i
. Both of which receive special treatment. The problem really arises in the combination of the two.
-
NA
位于i
被视为FALSE
。 将 -
!
视为否定。
i
中的NA
's ini
are treated asFALSE
.!
ini
are treated as a negation.
这在 ?. data.table
Grothendieck在另一个答案中指出)。
相关部分为:
This is well documented in ?.data.table
(as G. Grothendieck points out in another answer).
The relevant portions being:
整数和逻辑向量的工作方式与[.data.frame。除了在逻辑i中的NAs被视为FALSE,并且单个NA逻辑不被回收以匹配行数,因为它在[.data.frame。
...
所有类型的'i'可以加前缀!。这表示应该执行非加入或非选择。在整个data.table文档中,我们引用i的类型,我们指的是'!'后面的'i'类型,如果存在。
integer and logical vectors work the same way they do in [.data.frame. Other than NAs in logical i are treated as FALSE and a single NA logical is not recycled to match the number of rows, as it is in [.data.frame.
...
All types of 'i' may be prefixed with !. This signals a not-join or not-select should be performed. Throughout data.table documentation, where we refer to the type of 'i', we mean the type of 'i' after the '!', if present.
如果你看看 [。data.table
的代码,!
如果存在,则是
If you look at the code for [.data.table
, the way !
are handled, if present, is by
- 删除前面的
!
- 解释剩余的
i
- 否定解释
处理 NA
的方法是将这些值设置为 FALSE
。
然而 - 非常重要的是 - 这发生在上面的步骤2中。
The way NA
s are handled is by setting those values to FALSE
.
However -- and very importantly -- this happens within step 2 above.
因此,真正发生的是当 i
包含 NA
AND i
前缀为!
,则NA有效地解释为 TRUE
。虽然在技术上,这是记录,我不知道这是否如预期。
Thus, what is really happening is that when i
contains NA
AND i
is prefixed by !
, then the NA's are effectively interpreted as TRUE
. While technically, this is as documented, I am not sure if this is as intended.
当然,最后一个问题是@ flodel的观点:为什么 .logical(a)]
不同于 x [!! as.logical(a)]
?原因是只有第一 bang获得特殊处理。第二个bang被 R
解释为正常。
Of course, there is the final question of @flodel's point: Why is x[as.logical(a)]
not the same as x[!!as.logical(a)]
? The reason for this is that only the first bang gets special treatment. The second bang is interpreted as normal by R
.
由于!NA
仍然是 NA
(NA)的解释的修改是:
Since !NA
is still NA
, the sequence of modification for the interpretation of !!(NA) is:
!!(NA)
!( !(NA) )
!( NA )
!( FALSE )
TRUE
这篇关于NA在`i`表达的data.table(可能的bug)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!