operator ==在data.table的逻辑列中不一致 [英] Operator == inconsistent in logical columns in data.table
问题描述
请参阅以下可重现的示例:
Please see the following reproducible example:
library(data.table)
set.seed(123)
DT <- data.table(A=rep(0.3,10000))
DT[, B := runif(.N) < A]
DT[B == T, .N]
# [1] 3005
DT[, summary(B)]
# Mode FALSE TRUE NA's
# logical 6995 3005 0
一切正常,TRUE值的计数与2方法。现在用新的替换B。
Everything looks fine and the count of "TRUE" values is the same for the 2 methods. Now replace col B with a new one.
DT[, B := runif(.N) < A]
DT[B == T, .N]
# [1] 3331
DT[, summary(B)]
# Mode FALSE TRUE NA's
# logical 6981 3019 0
B列中的'T'计数不同!它是同一列,但一种方法给出3331TRUE值,其他3019。
The count of 'T' in the column B is different!!! It is the same column but one method gives 3331 "TRUE" values and the other 3019.
当绕过==时
DT[B != F, .N]
# [1] 3019
DT[, summary(B)]
# Mode FALSE TRUE NA's
# logical 6981 3019 0
这是正确的
我可以用Windows 8.1 x64上的data.table v1.94和1.9.5重现它。
I can reproduce it with data.table v1.94 and 1.9.5 on Windows 8.1 x64.
这是一个更容易重现的例子,不带 runif()
。
Here's a much easier reproducible example without runif()
.
require(data.table) ## 1.9.4+
DT = data.table(x = 1:5)
DT[, y := x <= 2L]
# x y
# 1: 1 TRUE
# 2: 2 TRUE
# 3: 3 FALSE
# 4: 4 FALSE
# 5: 5 FALSE
DT[y == TRUE, .N]
# [1] 2 <~~~~~~ correct result.
DT[, y := x <= 3L]
# x y
# 1: 1 TRUE
# 2: 2 TRUE
# 3: 3 TRUE
# 4: 4 FALSE
# 5: 5 FALSE
DT[y == TRUE, .N]
# [1] 2 <~~~~~~ incorrect result, should be 3!
推荐答案
现在固定在 v1.9.5 在GitHub上。
Now fixed in v1.9.5 on GitHub.
:=
和set *
现在删除辅助密钥(v1.9.4中的新):=
或设置* $ c $后,$ c> $ c> DT [x == y]
c>不需要选项(datatable.auto.index = FALSE)
。只有setkey()
正确放置辅助键。添加了23个测试。感谢user36312进行报告,#885 。
:=
andset*
now drop secondary keys (new in v1.9.4) so thatDT[x==y]
works again after a:=
orset*
without needingoptions(datatable.auto.index=FALSE)
. Onlysetkey()
was dropping secondary keys correctly. 23 tests added. Thanks to user36312 for reporting, #885.
这篇关于operator ==在data.table的逻辑列中不一致的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!