dcast.data.table中的大数字错误 [英] Error with large numerics in dcast.data.table
问题描述
给定一个数据框架,我试图使用库(数据)中的
。但是,当在公式的左边使用大数字时,它会有一些组合方式。 dcast.data.table
函数从长到宽投射。表)
Given a data frame I am trying to cast from long-to-wide using the dcast.data.table
function from library(data.table)
. However, when using large numeric's on the left side of the formula it some how combines.
下面是一个示例:
df <- structure(list(A = c(10000000007624, 10000000007619, 10000000007745,
10000000007624, 10000000007767, 10000000007729, 10000000007705,
10000000007711, 10000000007784, 10000000007745, 10000000007624,
10000000007762, 10000000007762, 10000000007631, 10000000007762,
10000000007619, 10000000007628, 10000000007705, 10000000007762,
10000000007624, 10000000007745, 10000000007706, 10000000007767,
10000000007777, 10000000007624, 10000000007745, 10000000007624,
10000000007777, 10000000007771, 10000000007631, 10000000007624,
10000000007640, 10000000007642, 10000000007708, 10000000007711,
10000000007745, 10000000007767, 10000000007655, 10000000007722,
10000000007745, 10000000007762, 10000000007771, 10000000007617
), B = c(4060697L, 7683673L, 7699192L, 1322422L, 7754939L, 7448486L,
2188027L, 1061376L, 2095950L, 7793530L, 2095950L, 6447861L, 2188027L,
7448451L, 7428427L, 7516354L, 7067801L, 2095950L, 6740142L, 405911L,
4057215L, 1061345L, 7754945L, 7501748L, 2188027L, 7780980L, 6651988L,
6649330L, 6655118L, 6556367L, 6463510L, 2347462L, 7675114L, 6556361L,
1061345L, 7224099L, 6463515L, 2188027L, 6463515L, 7311234L, 7764971L,
7224099L, 2347479L), C = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L,
3L, 3L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 25L, 2L, 1L, 2L,
1L, 1L, 1L)), .Names = c("A", "B", "C"), row.names = c(NA, -43L
), class = "data.frame")
df <- as.data.table(df)
output <- dcast.data.table(df, A ~ B, value.var = "C",
fun.aggregate = sum, fill = NA)
这将只产生两行, 10000000007624
& 10000000007784
,并且所有内容将只在这两个内容中汇总。
This will produce only 2 rows, 10000000007624
& 10000000007784
and everything will be summed up in just those two.
使用 reshape2 :: dcast
函数,此方法产生正确的结果。
This error does not occur when using reshape2::dcast
function, this method produces the correct result.
有什么原因 dcast .data.table
正在产生此错误?
Is there a reason why dcast.data.table
is producing this error?
推荐答案
jangorecki,并且此回答来自 setNumericRounding
帮助文档。
Issue was raised on github and responded by @jangorecki and this answer comes from the setNumericRounding
help document.
,data.table将这样的数据舍入到apx 11 sf这在许多情况下是大量的数字。
when joining or grouping, data.table rounds such data to apx 11 s.f. which is plenty of digits for many cases. This is achieved by rounding the last 2 bytes off the significand.
这样我的14位数字的大数字的四舍五入,因此组合。
As such my 14 digit large numeric's where getting rounded and therefore combined.
正如@jangorecki所说,这可以通过设置 setNumericRounding(0)
来避免。然而,我个人已经将我的大数字重新分类为因素。
As @jangorecki mentions this can be avoided by setting setNumericRounding(0)
. However, I personally have re-classified my large numeric's to factors. This make more sense for my particular use case.
除此之外,@jangorecki还建议使用 bit64
包
Further to this @jangorecki also advises use of bit64
package when dealing with large numeric's.
github 。
这篇关于dcast.data.table中的大数字错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!