在R中对很小的值使用取整函数将返回零 [英] Using round function in R on very small values returns zero

查看:105
本文介绍了在R中对很小的值使用取整函数将返回零的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有时不得不处理非常低的p值,并以表格格式显示它们.R返回的值可以包含长有效数字(即小数点后的数字).现在既然p值仍然很低,我倾向于在将它们写入.xls或.tsv文件之前将它们缩短(只是使表格看起来很漂亮!)

我正在使用 R版本3.0.0(2013-04-03)

一些背景和示例:

R中的

9.881313e-208 在我的表中将为 9.88e-208

我可以在R中使用 round 函数来做到这一点.

 回合(9.881313e-208,210)[1] 9.88e-208 

但是 e 的幂值在每种情况下都可以不同,并且由于存在许多此类情况,因此我使用以下公式:-

  x = 9.881313e-208round(x,abs(floor(log10(x)-2)))##我经过反复试验来到了这里[1] 9.88e-208 

我已经根据经验测试了该公式,并且该公式在不同情况下都有效:-

  a<-c(1.345678e-150,8.543678e-250,5.555555e-303,0.01123,4.523456e-290)回合(a,abs(floor(log10(a)-2)))[1] 1.35e-150 8.54e-250 5.56e-303 1.12e-02 4.52e-290 

现在,当 e 的幂超过数字306(甚至307很好,但在308之后开始变得奇怪)时,问题就开始了

  ##示例1:b<-c(1.345678e-306,1.345678e-307,1.345678e-308,1.345678e-309,1.345678e-310)回合(b,abs(floor(log10(b)-2)))[1] 1.35e-306 1.30e-307 1.00e-308 0.00e + 00 0.00e + 00##示例2:b<-c(3.345678e-306,3.345678e-307,3.345678e-308,3.345678e-309,3.345678e-310)回合(b,abs(floor(log10(b)-2)))[1] 3.35e-306 3.30e-307 3.00e-308 0.00e + 00 0.00e + 00##示例3:b<-c(7.345678e-306,7.345678e-307,7.345678e-308,7.345678e-309,7.345678e-310)回合(b,abs(floor(log10(b)-2)))[1] 7.35e-306 7.30e-307 7.00e-308 1.00e-308 0.00e + 00 

我也直接检查了这些:

 回合(7.356789e-306,308)[1] 7.36e-306圆(7.356789e-307,309)[1] 7.4e-307圆(7.356789e-308,310)[1] 7e-308圆(7.356789e-309,311)[1] 1e-308圆(7.356789e-310,312)[1] 0圆形(7.356789e-311,313)[1] 0 

我在这里错过了一些琐碎的事情吗,还是 round 函数是否达到了超出 e-308 的分辨率极限.我知道这些值非常低,几乎等于零,但是我仍然希望具有确切的值.我在使用Python的SO中看到了一些针对此问题的答案,(请参阅解决方案

此答案基于以下假设:R的浮点数由IEEE 754 64位二进制数表示.这与报道的结果相符,而且很可能具有内在的可能性.

对绝对幅度在2e-308以下的数字做很多算术是非常有问题的.低于该点,精度会下降.最小的可表示数字,约4.9E-324,其表示中有一个有效位.它的一对相邻数字是0和大约1.0E-323.任何舍入错误会将其减少为零或加倍.它不能遇到仅影响其十进制表示形式的低有效位的细微舍入错误.同样, round 不能对其稍作更改.它可以不变地返回,翻倍,返回0或进行更大的更改.

请参阅反常数,以了解发生了什么.

解决方案是,尽可能避免对如此小的数字进行算术运算.正如注释中已经指出的,一种好的方法是使用对数.如果您需要处理非常大和非常小的数字,那是唯一的选择.

如果这是不可能的,并且您的数字都很小,请考虑以适当大的2的幂进行缩放.范围内的2的幂可以精确表示,并且乘以它们仅会更改指数,而没有舍入误差.在四舍五入之前,您可以通过常数缩放要存储在文本文件中的所有数字.

I sometimes have to deal with very low p values and present them in a tabular format. The values returned by R can have long significance digits (i.e. digits after the decimal point). Now since the p value is anyways so low , I tend to shorten them before writing them into a .xls or .tsv files.(Just to make the tables look pretty !!)

I am using R version 3.0.0 (2013-04-03)

Some background and examples :

9.881313e-208 in R will be 9.88e-208 in my table

I can use the round function in R to do this.

round(9.881313e-208, 210)
[1] 9.88e-208

However the power value of e can differ in every case, and since there are many such cases I use the following formula :-

x = 9.881313e-208
round(x,abs(floor(log10(x)-2)))   ## I came to this following trial and error
[1] 9.88e-208

I have tested this formula empirically and it works in different cases like :-

a <- c(1.345678e-150,8.543678e-250,5.555555e-303, 0.01123, 4.523456e-290)
round(a,abs(floor(log10(a)-2)))
[1] 1.35e-150 8.54e-250 5.56e-303  1.12e-02 4.52e-290

Now the problem starts when the power of e exceeds the number 306 (even 307 is fine, but starts getting strange after 308)

## Example 1:
b <- c(1.345678e-306,1.345678e-307,1.345678e-308, 1.345678e-309, 1.345678e-310)
round(b,abs(floor(log10(b)-2)))
[1] 1.35e-306 1.30e-307 1.00e-308  0.00e+00  0.00e+00

## Example 2:
b <- c(3.345678e-306,3.345678e-307,3.345678e-308, 3.345678e-309, 3.345678e-310)
round(b,abs(floor(log10(b)-2)))

[1] 3.35e-306 3.30e-307 3.00e-308  0.00e+00  0.00e+00

## Example 3:
b <- c(7.345678e-306,7.345678e-307,7.345678e-308, 7.345678e-309, 7.345678e-310)
round(b,abs(floor(log10(b)-2)))

[1] 7.35e-306 7.30e-307 7.00e-308 1.00e-308  0.00e+00

Also, I checked these directly:

round(7.356789e-306,308)
[1] 7.36e-306

round(7.356789e-307,309)
[1] 7.4e-307

round(7.356789e-308,310)
[1] 7e-308

round(7.356789e-309,311)
[1] 1e-308

round(7.356789e-310,312)
[1] 0

round(7.356789e-311,313)
[1] 0

Am I missing something trivial here or does the round function hit a resolution limit beyond e-308. I know these values are extremely low and is almost equal to zero, however I would still prefer to have exact value. I saw some answers in SO using Python for this problem, (See How right way to round a very small negative number?) but are there any suggestions as to how can I overcome this in R ?

Help much appreciated

Cheers

Ashwin

解决方案

This answer is based on the assumption that R's floating point numbers are being represented by IEEE 754 64-bit binary numbers. That is consistent with the reported results, and inherently very likely.

Doing much arithmetic on numbers with absolute magnitude below about 2e-308 is very problematic. Below that point, precision drops with magnitude. The smallest representable number, about 4.9E-324, has one significant bit in its representation. Its pair of adjacent numbers are 0 and about 1.0E-323. Any rounding error will either reduce it to zero or double it. It cannot experience a subtle rounding error that only affects low significance digits in its decimal representation. Similarly, round cannot change it slightly. It can return it unchanged, double it, return 0, or make an even bigger change.

See Denormal number for more explanation of what is happening.

The solution is to, if at all possible, avoid doing arithmetic on such tiny numbers. One good approach, as already pointed out in comments, is to use logarithms. That is the only way to go if you need to deal with very large as well as very small numbers.

If that is not possible, and your numbers are all small, consider scaling by a moderately large power of two. Powers of two that are in range are exactly representable, and multiplying by them only changes the exponent, with no rounding error. You could scale all the numbers you are going to store in the text file by a constant before rounding.

这篇关于在R中对很小的值使用取整函数将返回零的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆