在data.table v1.8.10和v1.9.2中对非常小的数字（例如1e-28）和0.0进行分组 [英] Grouping very small numbers (e.g. 1e-28) and 0.0 in data.table v1.8.10 vs v1.9.2

查看：142 发布时间：2017/3/12 11:01:18 r debugging data.table frequency

本文介绍了在data.table v1.8.10和v1.9.2中对非常小的数字（例如1e-28）和0.0进行分组的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我注意到data.table在R中创建的频率表似乎没有区分非常小的数字和零？

可重现的范例：

 > library（data.table）
 DT<  -  data.table（c（0.0000000000000000000000000001,2,9999,0））
 test1<  -  as.data.frame DT [，V1]））
 test2 < -  DT [，.N，by = V1]

$ b b

如您所见，频率表（test2）不会识别0.0000000000000000000000000001和0之间的差异，并将两个观测值放在同一个类中。

数据。表版本：1.8.10

R：3.02

解决方案

值得阅读 R FAQ 7.31 并考虑浮点表示的准确性。 / p>

我无法在当前的起重机版本（1.9.2）中重现这一点。使用

  R版本3.0.3（2014-03-06）
平台：x86_64-w64-mingw32 / x64 （64位）

我猜猜behaivour的改变会与这个新闻项目有关。 / p>

o数字数据仍然加入并按照前面的公差分组，而不是容差
为sqrt（.Machine $ double.eps ）== 1.490116e-08（与base :: all.equal的默认值相同）
有效位数现在舍入为最后2个字节，apx 11 sf这对于大（1.23e20）和小（1.23e-20）数字更合适
，并且通过简单的比特旋转更快。
一些函数提供了一个tolerance参数，但这没有被传递，因此删除了
。我们的目标是在未来版本中添加一个全局选项（例如2，1或0字节四舍五入）。

从Matt更新

是的，这是v1.9.2和数据的有意更改。表现在区分 0.0000000000000000000000000001 从 0 （因为user3340145正确认为它应该）改进舍入方法，从新闻。

 
 
 我还从Rick的测试套件回答中添加了 for 循环测试。 
 
 
  Btw，＃5369现在在v1.9.3中实现（虽然这些问题都不需要）：
 
  o bit64 :: integer64现在可以在分组和联接中使用，＃5369。感谢
 James Sams用于突出显示UPC。
 
 
  o新功能setNumericRounding（）可用于减少1字节
或0字节舍入加入或分组类型numeric的列，＃5369。 
参见？setNumericRounding中的示例和v1.9.2中的NEWS项。 
 getNumericRounding（）返回当前设置。
 
 
请注意，舍入现在（从v1.9.2起）的有效位数;即有效数字的数目。  0.0000000000000000000000000001 == 1.0e-28 准确到只有1 sf，因此新的舍入方法不会将此与 0.0 。
 
 
 总之，问题的答案是：从v1.8.10升级到v1.9.2或更高版本。
 
I noticed that frequency tables created by data.table in R seem not to distinguish between very small numbers and zero? Can I change this behavior or is this a bug?

Reproducible example:   
>library(data.table)   
DT <- data.table(c(0.0000000000000000000000000001,2,9999,0))    
test1 <- as.data.frame(unique(DT[,V1]))   
test2 <-  DT[, .N, by = V1] 
As you can see, the frequency table (test2) will not recognize the differences between 0.0000000000000000000000000001 and 0 and put both observations in the same class.

Data.table version: 1.8.10

R: 3.02
 解决方案 
It is worth reading R FAQ 7.31 and thinking about the accuracy of floating point represenations.

I can't reproduce this in the current cran version (1.9.2). using 
R version 3.0.3 (2014-03-06)
Platform: x86_64-w64-mingw32/x64 (64-bit)
My guess that the change in behaivour will be related to this news item.

  o  Numeric data is still joined and grouped within tolerance as before but instead of tolerance
       being sqrt(.Machine$double.eps) == 1.490116e-08 (the same as base::all.equal's default)
       the significand is now rounded to the last 2 bytes, apx 11 s.f. This is more appropriate
       for large (1.23e20) and small (1.23e-20) numerics and is faster via a simple bit twiddle.
       A few functions provided a 'tolerance' argument but this wasn't being passed through so has
       been removed. We aim to add a global option (e.g. 2, 1 or 0 byte rounding) in a future release.




Update from Matt

Yes this was a deliberate change in v1.9.2 and data.table now distinguishes 0.0000000000000000000000000001 from 0 (as user3340145 rightly thought it should) due to the improved rounding method highlighted above from NEWS.

I've also added the for loop test from Rick's answer to the test suite.

Btw, #5369 is now implemented in v1.9.3 (although neither of these are needed for this question) :

  o  bit64::integer64 now works in grouping and joins, #5369. Thanks to
  James Sams for highlighting UPCs.
  
  o  New function setNumericRounding() may be used to reduce to 1 byte
  or 0 byte rounding when joining to or grouping columns of type 'numeric', #5369.
  See example in ?setNumericRounding and NEWS item from v1.9.2.
  getNumericRounding() returns the current setting.
Notice that rounding is now (as from v1.9.2) about the accuracy of the significand; i.e. the number of significant figures. 0.0000000000000000000000000001 == 1.0e-28 is accurate to just 1 s.f., so the new rounding method doesn't group this together with 0.0.

In short, the answer to the question is : upgrade from v1.8.10 to v1.9.2 or greater.

                        这篇关于在data.table v1.8.10和v1.9.2中对非常小的数字（例如1e-28）和0.0进行分组的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

在data.table v1.8.10和v1.9.2中对非常小的数字（例如1e-28）和0.0进行分组 [英] Grouping very small numbers (e.g. 1e-28) and 0.0 in data.table v1.8.10 vs v1.9.2

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在data.table v1.8.10和v1.9.2中对非常小的数字（例如1e-28）和0.0进行分组 [英] Grouping very small numbers (e.g. 1e-28) and 0.0 in data.table v1.8.10 vs v1.9.2

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭