NA在`i`表达的data.table(可能的bug) [英] NA in `i` expression of data.table (possible bug)

查看:133
本文介绍了NA在`i`表达的data.table(可能的bug)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

i 包含 NA 时,不返回该特定行。我不知道这是预期的行为还是它?

When i contains NA, that particular row is not returned. I am not sure this is the intended behavior or is it?

require(data.table)
x = data.table(a=c(NA, 1:3, NA))    
x[a>0]       
   a
1: 1
2: 2
3: 3

x[!(a>0)]
    a
1: NA
2: NA

x[a<0]   
Empty data.table (0 rows) of 1 col: a

x[!(a<0)]
    a
1: NA
2:  1
3:  2
4:  3
5: NA

 > sessionInfo()
 R version 2.15.2 (2012-10-26)
 Platform: x86_64-unknown-linux-gnu (64-bit)

 locale:
  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
  [7] LC_PAPER=C                 LC_NAME=C                 
  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
  [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

 attached base packages:
  [1] stats     graphics  grDevices utils     datasets  methods   base     

 other attached packages:
  [1] data.table_1.8.8


推荐答案

正如@flodel指出的,问题可以简化为为什么这不是 TRUE

As @flodel points out, the question can be simplified to, Why is this not TRUE:

identical(x[as.logical(a)], x[!!as.logical(a)])   # note the double bangs

答案在于data.table如何处理 i 中的 NA 以及如何处理 i 中。两者都接受特殊待遇。问题真的出现在两者的结合。

The answer lies in how data.table handles NA in i and how it handles ! in i. Both of which receive special treatment. The problem really arises in the combination of the two.


  • NA 位于 i 被视为 FALSE
  • i 中的
  • 视为否定。

  • NA's in i are treated as FALSE.
  • ! in i are treated as a negation.

这在 ?. data.table Grothendieck在另一个答案中指出)。
相关部分为:

This is well documented in ?.data.table (as G. Grothendieck points out in another answer). The relevant portions being:


整数和逻辑向量的工作方式与[.data.frame。除了在逻辑i中的NAs被视为FALSE,并且单个NA逻辑不被回收以匹配行数,因为它在[.data.frame。

...

所有类型的'i'可以加前缀!。这表示应该执行非加入或非选择。在整个data.table文档中,我们引用i的类型,我们指的是'!'后面的'i'类型,如果存在。

integer and logical vectors work the same way they do in [.data.frame. Other than NAs in logical i are treated as FALSE and a single NA logical is not recycled to match the number of rows, as it is in [.data.frame.
...
All types of 'i' may be prefixed with !. This signals a not-join or not-select should be performed. Throughout data.table documentation, where we refer to the type of 'i', we mean the type of 'i' after the '!', if present.

如果你看看 [。data.table 的代码,如果存在,则是

If you look at the code for [.data.table, the way ! are handled, if present, is by


  1. 删除前面的

  2. 解释剩余的 i

  3. 否定解释

处理 NA 的方法是将这些值设置为 FALSE

然而 - 非常重要的是 - 这发生在上面的步骤2中。

The way NAs are handled is by setting those values to FALSE.
However -- and very importantly -- this happens within step 2 above.

因此,真正发生的是当 i 包含 NA AND i 前缀为,则NA有效地解释为 TRUE 。虽然在技术上,这是记录,我不知道这是否如预期。

Thus, what is really happening is that when i contains NA AND i is prefixed by !, then the NA's are effectively interpreted as TRUE. While technically, this is as documented, I am not sure if this is as intended.

当然,最后一个问题是@ flodel的观点:为什么 .logical(a)] 不同于 x [!! as.logical(a)] ?原因是只有第一 bang获得特殊处理。第二个bang被 R 解释为正常。

Of course, there is the final question of @flodel's point: Why is x[as.logical(a)] not the same as x[!!as.logical(a)]? The reason for this is that only the first bang gets special treatment. The second bang is interpreted as normal by R.

由于!NA 仍然是 NA (NA)的解释的修改是:

Since !NA is still NA, the sequence of modification for the interpretation of !!(NA) is:

!!(NA)  
!( !(NA) )  
!(  NA   )
!( FALSE )
TRUE

这篇关于NA在`i`表达的data.table(可能的bug)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆