R数据表。选择行(整数比较) [英] R data.table select rows (integer comparison)

查看:144
本文介绍了R数据表。选择行(整数比较)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当尝试通过指定a的值来选择 data.table R 的包)字段由大整数组成,我得到奇怪的结果。

When trying to select rows in a data.table (package for R) by specifying the value of a field consisting of large integers, I get strange results. Namely, similar integers are selected too.

require(data.table)
options(digits=15)
data <- data.table(A=c(1000200030001,1000200030002,1000200030003))

尝试通过检查A的值访问第一行:

Try to access the first row by checking the value of A:

data[A==1000200030001]
               A
1: 1000200030001
2: 1000200030002
3: 1000200030003

全部

当使用 as.numeric时问题解决

data[as.numeric(A)==1000200030001]
               A
1: 1000200030001

问题不存在于 j 部分数据中。表:

Problem not present in jpart of data.table:

data[,A == 1000200030001]
[1]  TRUE FALSE FALSE

这似乎是比较大数字的精度的问题。我很困惑,使用 as.numeric 解决问题,因为 str(data)显示A已经是类型数字:

This seems to be a problem with the precision of comparing large numbers. I am very confused that using as.numeric solves the issue since str(data) shows that A already is of type numeric:

str(data)
Classes ‘data.table’ and 'data.frame':  3 obs. of  1 variable:
 $ A: num  1e+12 1e+12 1e+12
 - attr(*, ".internal.selfref")=<externalptr> 
 - attr(*, "index")= atomic  
  ..- attr(*, "A")= int 

任何提示如何确保这个问题不会出现在(生产)代码中。

Any hints as how to ensure this problem does not appear in (productive) code are appreciated!

UPDATE:
禁用自动索引时,上述问题不存在。

UPDATE: The problem described above is not present when disabling auto-indexing.

options(datatable.auto.index=FALSE)

但是,通过禁用自动索引无法解决聚合和合并/加入的问题:

However, problems with aggregation and merging/joining are not solved by disabling auto-indexing:

data[,.(B=sum(A)),A]
               A             B
1: 1000200030001 1000200030001

正确的输出将是:

               A             B
1: 1000200030001 1000200030001
2: 1000200030002 1000200030002
3: 1000200030003 1000200030003

我发现所有这些问题的最佳解决方案,使用 bit64 包如所选答案中所述。非常感谢大家!

I found the best solution to all of these problems to use the bit64 package as described in the selected answer. Thanks everybody!

推荐答案

使用 bit64 :: integer64

require(data.table)
options(digits=15)
library(bit64)
data <- fread("A
              1000200030001
              1000200030002
              1000200030003", colClasses = "integer64")


data[A == as.integer64("1000200030001")]
#A
#1: 1000200030001   

,停用自动索引(并从中失去性能优势):

Alternatively, deactivate auto-indexing (and lose the performance advantage from it):

options(datatable.auto.index=FALSE)
data <- data.table(A=c(1000200030001,1000200030002,1000200030003))
data[(A==1000200030001)]
#               A
#1: 1000200030001

这篇关于R数据表。选择行(整数比较)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆