DT[!(x == .)] 和 DT[x != .] 对待 NA 在 x 不一致 [英] DT[!(x == .)] and DT[x != .] treat NA in x inconsistently

查看:17
本文介绍了DT[!(x == .)] 和 DT[x != .] 对待 NA 在 x 不一致的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是我认为我应该在 这个问题之后提出的问题.在将其提交到 R-forge 跟踪器之前,我想确认这是否是错误/不一致.

This is something that I thought I should ask following this question. I'd like to confirm if this is a bug/inconsistency before filing it as a such in the R-forge tracker.

考虑一下这个data.table:

require(data.table)
DT <- data.table(x=c(1,0,NA), y=1:3)

现在,要访问 DT 中所有 0 的行,我们可以通过以下方式进行:

Now, to access all rows of the DT that are not 0, we could do it in these ways:

DT[x != 0]
#    x y
# 1: 1 1
DT[!(x == 0)]
#     x y
# 1:  1 1
# 2: NA 3

访问 DT[x != 0]DT[!(x==0)] 在底层逻辑操作相同时会给出不同的结果.

Accessing DT[x != 0] and DT[!(x==0)] gives different results when the underlying logical operation is equivalent.

注意:将其转换为 data.frame 并运行这些操作将得到两个逻辑等效操作的结果相同,但结果不同从这两个 data.table 结果.有关原因的解释,请查看 NAs in indexing 部分下的 ?`[`.

Note: Converting this into a data.frame and running these operations will give results that are identical with each other for both logically equivalent operations, but that result is different from both these data.table results. For an explanation of why, look at ?`[` under the section NAs in indexing.

由于你们中的一些人强调与 data.frame 相等,以下是对 data.frame 相同操作的输出片段:

Since some of you've stressed for equality with data.frame, here's the snippet of the output from the same operations on data.frame:

DF <- as.data.frame(DT)
# check ?`[` under the section `NAs in indexing` as to why this happens
DF[DF$x != 0, ]
#     x  y
# 1   1  1
# NA NA NA
DF[!(DF$x == 0), ]
#     x  y
# 1   1  1
# NA NA NA

我认为这是不一致的,两者应该提供相同的结果.但是,哪个结果?[.data.table 的文档说:

I think this is an inconsistency and both should provide the same result. But, which result? The documentation for [.data.table says:

我--->整数、逻辑或字符向量、列名表达式、列表或数据表.

i ---> Integer, logical or character vector, expression of column names, list or data.table.

整数和逻辑向量的工作方式与 [.data.frame.除了逻辑 i 中的 NA 之外,将被视为 FALSE,并且单个 NA 逻辑不会被回收以匹配行数,就像在 [.data.frame 中一样.

integer and logical vectors work the same way they do in [.data.frame. Other than NAs in logical i are treated as FALSE and a single NA logical is not recycled to match the number of rows, as it is in [.data.frame.

很清楚为什么结果与在 data.frame 上执行相同操作所得到的结果不同.但是,在 data.table 中,如果是这种情况,那么它们都应该返回:

It's clear why the results are different from what one would get from doing the same operation on a data.frame. But still, within data.table, if this is the case, then both of them should return:

#    x y
# 1: 1 1

我浏览了 [.data.table 源代码,现在了解为什么会发生这种情况.请参阅这篇文章了解为什么会发生这种情况的详细说明.

I went through [.data.table source code and now understand as to why this is happening. See this post for a detailed explanation of why this is happening.

简而言之,x != 0 的计算结果为逻辑";NA 被替换为 FALSE.但是,!(x==0),首先 (x == 0) 被评估为逻辑,NA 被替换为 FALSE.然后否定发生,这导致 NA 基本上变成 TRUE.

Briefly, x != 0 evaluates to "logical" and NA gets replaced to FALSE. However, !(x==0), first (x == 0) gets evaluated to logical and NA gets replaced to FALSE. Then the negation happens, which results in NA basically becoming TRUE.

所以,我的第一个(或者说主要的)问题是,这是一个错误/不一致吗?如果是这样,我会将它作为一个归档在 data.table R-forge 跟踪器中.如果没有,我想知道造成这种差异的原因,并且我想建议对解释这种差异的文档进行更正(对已经很棒的文档!).

So, my first (or rather main) question is, is this a bug/inconsistency? If so, I'll file it as one in data.table R-forge tracker. If not, I'd like to know the reason for this difference and I would like to suggest a correction to the documentation explaining this difference (to the already amazing documentation!).

跟进评论,第二个问题是,data.table 是否应该通过使用包含 NA 类似于 data.frame??(但我同意,根据@Roland 的评论,这可能会很好地引发意见,我完全可以不回答这个问题).

Following up with comments, the second question is, should data.table's handling for subsetting by indexing with columns containing NA resemble that of data.frame?? (But I agree, following @Roland's comment that this may be very well lead to opinions and I'm perfectly fine with not answering this question at all).

推荐答案

截至 版本 1.8.11 ! 不会触发逻辑表达式的非连接并且两个表达式的结果是一样的:

As of version 1.8.11 the ! does not trigger a not-join for logical expressions and the results for the two expressions are the same:

DT <- data.table(x=c(1,0,NA), y=1:3)
DT[x != 0]
#   x y
#1: 1 1
DT[!(x == 0)]
#   x y
#1: 1 1

@mnel 的回答中提到的其他几个表达式现在也以更可预测的方式表现:

A couple other expressions mentioned in @mnel's answer also behave in a more predictable fashion now:

DT[!(x != 0)]
#   x y
#1: 0 2
DT[!!(x == 0)]
#   x y
#1: 0 2

这篇关于DT[!(x == .)] 和 DT[x != .] 对待 NA 在 x 不一致的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆