R数据表。选择行(整数比较) [英] R data.table select rows (integer comparison)
问题描述
当尝试通过指定a的值来选择 data.table
( R
的包)字段由大整数组成,我得到奇怪的结果。
When trying to select rows in a data.table
(package for R
) by specifying the value of a field consisting of large integers, I get strange results. Namely, similar integers are selected too.
require(data.table)
options(digits=15)
data <- data.table(A=c(1000200030001,1000200030002,1000200030003))
尝试通过检查A的值访问第一行:
Try to access the first row by checking the value of A:
data[A==1000200030001]
A
1: 1000200030001
2: 1000200030002
3: 1000200030003
全部
当使用 as.numeric时问题解决
:
data[as.numeric(A)==1000200030001]
A
1: 1000200030001
问题不存在于 j
部分数据中。表:
Problem not present in j
part of data.table:
data[,A == 1000200030001]
[1] TRUE FALSE FALSE
这似乎是比较大数字的精度的问题。我很困惑,使用 as.numeric
解决问题,因为 str(data)
显示A已经是类型数字:
This seems to be a problem with the precision of comparing large numbers. I am very confused that using as.numeric
solves the issue since str(data)
shows that A already is of type numeric:
str(data)
Classes ‘data.table’ and 'data.frame': 3 obs. of 1 variable:
$ A: num 1e+12 1e+12 1e+12
- attr(*, ".internal.selfref")=<externalptr>
- attr(*, "index")= atomic
..- attr(*, "A")= int
任何提示如何确保这个问题不会出现在(生产)代码中。
Any hints as how to ensure this problem does not appear in (productive) code are appreciated!
UPDATE:
禁用自动索引时,上述问题不存在。
UPDATE: The problem described above is not present when disabling auto-indexing.
options(datatable.auto.index=FALSE)
但是,通过禁用自动索引无法解决聚合和合并/加入的问题:
However, problems with aggregation and merging/joining are not solved by disabling auto-indexing:
data[,.(B=sum(A)),A]
A B
1: 1000200030001 1000200030001
正确的输出将是:
A B
1: 1000200030001 1000200030001
2: 1000200030002 1000200030002
3: 1000200030003 1000200030003
我发现所有这些问题的最佳解决方案,使用 bit64
包如所选答案中所述。非常感谢大家!
I found the best solution to all of these problems to use the bit64
package as described in the selected answer. Thanks everybody!
推荐答案
使用 bit64 :: integer64
:
require(data.table)
options(digits=15)
library(bit64)
data <- fread("A
1000200030001
1000200030002
1000200030003", colClasses = "integer64")
data[A == as.integer64("1000200030001")]
#A
#1: 1000200030001
,停用自动索引(并从中失去性能优势):
Alternatively, deactivate auto-indexing (and lose the performance advantage from it):
options(datatable.auto.index=FALSE)
data <- data.table(A=c(1000200030001,1000200030002,1000200030003))
data[(A==1000200030001)]
# A
#1: 1000200030001
这篇关于R数据表。选择行(整数比较)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!