在data.table v1.8.10和v1.9.2中对非常小的数字(例如1e-28)和0.0进行分组 [英] Grouping very small numbers (e.g. 1e-28) and 0.0 in data.table v1.8.10 vs v1.9.2

查看:142
本文介绍了在data.table v1.8.10和v1.9.2中对非常小的数字(例如1e-28)和0.0进行分组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我注意到data.table在R中创建的频率表似乎没有区分非常小的数字和零?



可重现的范例:

 > library(data.table)
DT< - data.table(c(0.0000000000000000000000000001,2,9999,0))
test1< - as.data.frame DT [,V1]))
test2 < - DT [,.N,by = V1]


$ b b

如您所见,频率表(test2)不会识别0.0000000000000000000000000001和0之间的差异,并将两个观测值放在同一个类中。



数据。表版本:1.8.10

R:3.02

解决方案

值得阅读 R FAQ 7.31 并考虑浮点表示的准确性。 / p>

我无法在当前的起重机版本(1.9.2)中重现这一点。使用

  R版本3.0.3(2014-03-06)
平台:x86_64-w64-mingw32 / x64 (64位)

我猜猜behaivour的改变会与这个新闻项目有关。 / p>


o数字数据仍然加入并按照前面的公差分组,而不是容差
为sqrt(.Machine $ double.eps )== 1.490116e-08(与base :: all.equal的默认值相同)
有效位数现在舍入为最后2个字节,apx 11 sf这对于大(1.23e20)和小(1.23e-20)数字更合适
,并且通过简单的比特旋转更快。
一些函数提供了一个tolerance参数,但这没有被传递,因此删除了
。我们的目标是在未来版本中添加一个全局选项(例如2,1或0字节四舍五入)。







从Matt更新



是的,这是v1.9.2和数据的有意更改。表现在区分 0.0000000000000000000000000001 从 0 (因为user3340145正确认为它应该)改进舍入方法,从新闻



我还从Rick的测试套件回答中添加了 for 循环测试。



Btw,#5369现在在v1.9.3中实现(虽然这些问题都不需要):


o bit64 :: integer64现在可以在分组和联接中使用,#5369。感谢
James Sams用于突出显示UPC。



o新功能setNumericRounding()可用于减少1字节
或0字节舍入加入或分组类型numeric的列,#5369。
参见?setNumericRounding中的示例和v1.9.2中的NEWS项。
getNumericRounding()返回当前设置。


请注意,舍入现在(从v1.9.2起)的有效位数;即有效数字的数目。 0.0000000000000000000000000001 == 1.0e-28 准确到只有1 sf,因此新的舍入方法不会将此与 0.0



总之,问题的答案是:从v1.8.10升级到v1.9.2或更高版本。


I noticed that frequency tables created by data.table in R seem not to distinguish between very small numbers and zero? Can I change this behavior or is this a bug?

Reproducible example:

>library(data.table)   
DT <- data.table(c(0.0000000000000000000000000001,2,9999,0))    
test1 <- as.data.frame(unique(DT[,V1]))   
test2 <-  DT[, .N, by = V1] 

As you can see, the frequency table (test2) will not recognize the differences between 0.0000000000000000000000000001 and 0 and put both observations in the same class.

Data.table version: 1.8.10
R: 3.02

解决方案

It is worth reading R FAQ 7.31 and thinking about the accuracy of floating point represenations.

I can't reproduce this in the current cran version (1.9.2). using

R version 3.0.3 (2014-03-06)
Platform: x86_64-w64-mingw32/x64 (64-bit)

My guess that the change in behaivour will be related to this news item.

o Numeric data is still joined and grouped within tolerance as before but instead of tolerance being sqrt(.Machine$double.eps) == 1.490116e-08 (the same as base::all.equal's default) the significand is now rounded to the last 2 bytes, apx 11 s.f. This is more appropriate for large (1.23e20) and small (1.23e-20) numerics and is faster via a simple bit twiddle. A few functions provided a 'tolerance' argument but this wasn't being passed through so has been removed. We aim to add a global option (e.g. 2, 1 or 0 byte rounding) in a future release.


Update from Matt

Yes this was a deliberate change in v1.9.2 and data.table now distinguishes 0.0000000000000000000000000001 from 0 (as user3340145 rightly thought it should) due to the improved rounding method highlighted above from NEWS.

I've also added the for loop test from Rick's answer to the test suite.

Btw, #5369 is now implemented in v1.9.3 (although neither of these are needed for this question) :

o bit64::integer64 now works in grouping and joins, #5369. Thanks to James Sams for highlighting UPCs.

o New function setNumericRounding() may be used to reduce to 1 byte or 0 byte rounding when joining to or grouping columns of type 'numeric', #5369. See example in ?setNumericRounding and NEWS item from v1.9.2. getNumericRounding() returns the current setting.

Notice that rounding is now (as from v1.9.2) about the accuracy of the significand; i.e. the number of significant figures. 0.0000000000000000000000000001 == 1.0e-28 is accurate to just 1 s.f., so the new rounding method doesn't group this together with 0.0.

In short, the answer to the question is : upgrade from v1.8.10 to v1.9.2 or greater.

这篇关于在data.table v1.8.10和v1.9.2中对非常小的数字(例如1e-28)和0.0进行分组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆