dcast.data.table中的大数字错误 [英] Error with large numerics in dcast.data.table

查看:205
本文介绍了dcast.data.table中的大数字错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

给定一个数据框架,我试图使用库(数据)中的 dcast.data.table 函数从长到宽投射。表)。但是,当在公式的左边使用大数字时,它会有一些组合方式。

Given a data frame I am trying to cast from long-to-wide using the dcast.data.table function from library(data.table). However, when using large numeric's on the left side of the formula it some how combines.

下面是一个示例:

df <- structure(list(A = c(10000000007624, 10000000007619, 10000000007745, 
10000000007624, 10000000007767, 10000000007729, 10000000007705, 
10000000007711, 10000000007784, 10000000007745, 10000000007624, 
10000000007762, 10000000007762, 10000000007631, 10000000007762, 
10000000007619, 10000000007628, 10000000007705, 10000000007762, 
10000000007624, 10000000007745, 10000000007706, 10000000007767, 
10000000007777, 10000000007624, 10000000007745, 10000000007624, 
10000000007777, 10000000007771, 10000000007631, 10000000007624, 
10000000007640, 10000000007642, 10000000007708, 10000000007711, 
10000000007745, 10000000007767, 10000000007655, 10000000007722, 
10000000007745, 10000000007762, 10000000007771, 10000000007617
), B = c(4060697L, 7683673L, 7699192L, 1322422L, 7754939L, 7448486L, 
2188027L, 1061376L, 2095950L, 7793530L, 2095950L, 6447861L, 2188027L, 
7448451L, 7428427L, 7516354L, 7067801L, 2095950L, 6740142L, 405911L, 
4057215L, 1061345L, 7754945L, 7501748L, 2188027L, 7780980L, 6651988L, 
6649330L, 6655118L, 6556367L, 6463510L, 2347462L, 7675114L, 6556361L, 
1061345L, 7224099L, 6463515L, 2188027L, 6463515L, 7311234L, 7764971L, 
7224099L, 2347479L), C = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 
3L, 3L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 25L, 2L, 1L, 2L, 
1L, 1L, 1L)), .Names = c("A", "B", "C"), row.names = c(NA, -43L
), class = "data.frame")

df <- as.data.table(df)

output <- dcast.data.table(df, A ~ B, value.var = "C",
                           fun.aggregate = sum, fill = NA)

这将只产生两行, 10000000007624 & 10000000007784 ,并且所有内容将只在这两个内容中汇总。

This will produce only 2 rows, 10000000007624 & 10000000007784 and everything will be summed up in just those two.

使用 reshape2 :: dcast 函数,此方法产生正确的结果。

This error does not occur when using reshape2::dcast function, this method produces the correct result.

有什么原因 dcast .data.table 正在产生此错误?

Is there a reason why dcast.data.table is producing this error?

推荐答案

jangorecki,并且此回答来自 setNumericRounding 帮助文档。

Issue was raised on github and responded by @jangorecki and this answer comes from the setNumericRounding help document.


,data.table将这样的数据舍入到apx 11 sf这在许多情况下是大量的数字。

when joining or grouping, data.table rounds such data to apx 11 s.f. which is plenty of digits for many cases. This is achieved by rounding the last 2 bytes off the significand.

这样我的14位数字的大数字的四舍五入,因此组合。

As such my 14 digit large numeric's where getting rounded and therefore combined.

正如@jangorecki所说,这可以通过设置 setNumericRounding(0)来避免。然而,我个人已经将我的大数字重新分类为因素。

As @jangorecki mentions this can be avoided by setting setNumericRounding(0). However, I personally have re-classified my large numeric's to factors. This make more sense for my particular use case.

除此之外,@jangorecki还建议使用 bit64

Further to this @jangorecki also advises use of bit64 package when dealing with large numeric's.

github

这篇关于dcast.data.table中的大数字错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆