DT [!(x ==。)]和DT [x!=。]处理不一致的NA [英] DT[!(x == .)] and DT[x != .] treat NA in x inconsistently

查看:95
本文介绍了DT [!(x ==。)]和DT [x!=。]处理不一致的NA的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是我认为我应该询问 此问题

This is something that I thought I should ask following this question. I'd like to confirm if this is a bug/inconsistency before filing it as a such in the R-forge tracker.

考虑这个数据,我想确认这是否是一个错误/不一致之前在R-forge跟踪器中。 .table

require(data.table)
DT <- data.table(x=c(1,0,NA), y=1:3)

要访问不是 0的DT的所有行,我们可以通过以下方式进行:

Now, to access all rows of the DT that are not 0, we could do it in these ways:

DT[x != 0]
#    x y
# 1: 1 1
DT[!(x == 0)]
#     x y
# 1:  1 1
# 2: NA 3

访问当基础逻辑运算是等价的时,DT [x!= 0] DT [!(x == 0)] / strong>

Accessing DT[x != 0] and DT[!(x==0)] gives different results when the underlying logical operation is equivalent.

注意:将其转换为data.frame并运行这些操作会产生逻辑上相同的结果等效操作,但该结果与这两个data.table结果不同。有关为什么,请查看 下的 下的

Note: Converting this into a data.frame and running these operations will give results that are identical with each other for both logically equivalent operations, but that result is different from both these data.table results. For an explanation of why, look at ?`[` under the section NAs in indexing.

编辑:由于您有一些人强调与 data.frame ,这里是data.frame上相同操作的输出的片段:

Since some of you've stressed for equality with data.frame, here's the snippet of the output from the same operations on data.frame:

DF <- as.data.frame(DT)
# check ?`[` under the section `NAs in indexing` as to why this happens
DF[DF$x != 0, ]
#     x  y
# 1   1  1
# NA NA NA
DF[!(DF$x == 0), ]
#     x  y
# 1   1  1
# NA NA NA

我认为这是一个不一致,两者都应该提供相同的结果。但是,结果是什么? [。data.table 的文档说:

I think this is an inconsistency and both should provide the same result. But, which result? The documentation for [.data.table says:


i --->整数,逻辑或字符向量,列名称,列表或数据表达式的表达式。

i ---> Integer, logical or character vector, expression of column names, list or data.table.

整数和逻辑向量的工作方式与[.data.frame。 除了逻辑i中的NAs,它们被视为FALSE ,并且单个NA逻辑不会循环使用以匹配行数,因为它在[.data.frame。

integer and logical vectors work the same way they do in [.data.frame. Other than NAs in logical i are treated as FALSE and a single NA logical is not recycled to match the number of rows, as it is in [.data.frame.

很清楚为什么结果与在上执行相同操作会得到的结果不同data.frame 。但是,在data.table中,如果是这样,那么两者都应该返回:

It's clear why the results are different from what one would get from doing the same operation on a data.frame. But still, within data.table, if this is the case, then both of them should return:

#    x y
# 1: 1 1

我经历了 [。data.table 源代码,现在可以了解为什么会发生这种情况。有关为什么会发生这种情况的详细说明,请参见 此信息

I went through [.data.table source code and now understand as to why this is happening. See this post for a detailed explanation of why this is happening.

简而言之, x!= 0 评估为逻辑, NA 被替换为FALSE。但是,!(x == 0),首先(x == 0) c $ c> NA 被替换为FALSE。 然后发生否定,导致 NA 基本上变成 TRUE

Briefly, x != 0 evaluates to "logical" and NA gets replaced to FALSE. However, !(x==0), first (x == 0) gets evaluated to logical and NA gets replaced to FALSE. Then the negation happens, which results in NA basically becoming TRUE.

所以,我的第一个(或更主要的)问题是,这是一个错误/不一致?如果是这样,我将它作为一个在data.table R-forge跟踪器中。如果没有,我想知道这种差异的原因,我想建议更正文档解释这种差异(到已经惊人的文档!)。

So, my first (or rather main) question is, is this a bug/inconsistency? If so, I'll file it as one in data.table R-forge tracker. If not, I'd like to know the reason for this difference and I would like to suggest a correction to the documentation explaining this difference (to the already amazing documentation!).

修改:跟着评论,第二个问题是,应该 data.table 通过使用包含 NA 的列索引来处理子集的处理类似于 data.frame ? (但我同意,根据@ Roland的评论,这可能非常好导致意见,我完全没有回答这个问题)。

Following up with comments, the second question is, should data.table's handling for subsetting by indexing with columns containing NA resemble that of data.frame?? (But I agree, following @Roland's comment that this may be very well lead to opinions and I'm perfectly fine with not answering this question at all).

推荐答案

version 1.8.11 join的逻辑表达式,并且两个表达式的结果相同:

As of version 1.8.11 the ! does not trigger a not-join for logical expressions and the results for the two expressions are the same:

DT <- data.table(x=c(1,0,NA), y=1:3)
DT[x != 0]
#   x y
#1: 1 1
DT[!(x == 0)]
#   x y
#1: 1 1

现在在@ mnel的回答中提到的其他表达式也表现得更加可预测:

A couple other expressions mentioned in @mnel's answer also behave in a more predictable fashion now:

DT[!(x != 0)]
#   x y
#1: 0 2
DT[!!(x == 0)]
#   x y
#1: 0 2

这篇关于DT [!(x ==。)]和DT [x!=。]处理不一致的NA的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆