如何将double/float舍入为BINARY精度? [英] How to round a double/float to BINARY precision?

查看:67
本文介绍了如何将double/float舍入为BINARY精度?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在编写用于对浮点数进行计算的代码的测试.可以预期,结果很少是准确的,我想在计算结果和预期结果之间设置一个容差.我已经验证过,实际上,以双精度,四舍五入后两个有效小数点后的结果始终是正确的,但通常通常在四舍五入后的小数点后仍然正确.我知道 double float 的存储格式,以及两种主要的舍入方法(通过 BigDecimal 精确进行)并通过乘法, math.round 和除法来更快).但是,由于尾数以二进制形式存储,是否有一种方法可以使用基数2而不是10进行舍入?

I am writing tests for code performing calculations on floating point numbers. Quite expectedly, the results are rarely exact and I would like to set a tolerance between the calculated and expected result. I have verified that in practice, with double precision, the results are always correct after rounding of last two significant decimals, but usually after rounding the last decimal. I am aware of the format in which doubles and floats are stored, as well as the two main methods of rounding (precise via BigDecimal and faster via multiplication, math.round and division). As the mantissa is stored in binary however, is there a way to perform rounding using base 2 rather than 10?

仅清除最后3位几乎总是会得到相等的结果,但是如果我可以将其压入位,而如果设置了第二个最低有效位,则将其加2"到尾数,则可能会达到精度极限.这将很容易,希望我不知道如何处理溢出(当所有52-1位都已设置时).

Just clearing the last 3 bits almost always yields equal results, but if I could push it and instead 'add 2' to the mantissa if its second least significast bit is set, I could probably reach the limit of accuracy. This would be easy enough, expect I have no idea how to handle overflow (when all bits 52-1 are set).

首选Java解决方案,但如果我理解的话,我可能会将其移植为另一种语言.

A Java solution would be preferred, but I could probably port one for another language if I understood it.

问题的一部分是我的代码在算术方面是通用的(依赖于 scala.Numeric 类型类),我所做的是将答案中建议的舍入合并为新的数字类型,其中包含计算出的数字(在这种情况下为浮点数)和舍入误差,本质上表示的是范围而不是点.然后我覆盖等于,如果两个数字的误差范围重叠(并且它们共享算术,即数字类型),则两个数字相等.

As part of the problem was that my code was generic with regards to arithmetic (relying on scala.Numeric type class), what I did was an incorporation of rounding suggested in the answer into a new numeric type, which carried the calculated number (floating point in this case) and rounding error, essentially representing a range instead of a point. I then overrode equals so that two numbers are equal if their error ranges overlap (and they share arithmetic, i.e. the number type).

推荐答案

是的,四舍五入二进制数字比通过 BigDecimal 有意义,并且如果您不担心会很有效地实现在 Double.MAX_VALUE 的很小的范围内.

Yes, rounding off binary digits makes more sense than going through BigDecimal and can be implemented very efficiently if you are not worried about being within a small factor of Double.MAX_VALUE.

您可以按照以下顺序在Java中对浮点数 double x 进行四舍五入(未经测试):

You can round a floating-point double value x with the following sequence in Java (untested):

double t = 9 * x; // beware: this overflows if x is too close to Double.MAX_VALUE
double y = x - t + t;

在此序列之后, y 应该包含四舍五入的值.调整常量 9 中两个设置位之间的距离,以调整四舍五入的位数.值 3 舍入一位.值 5 舍入两位.值 17 会四舍五入,以此类推.

After this sequence, y should contain the rounded value. Adjust the distance between the two set bits in the constant 9 in order to adjust the number of bits that are rounded off. The value 3 rounds off one bit. The value 5 rounds off two bits. The value 17 rounds off four bits, and so on.

该指令序列归因于Veltkamp,通常在"Dekker乘法"中使用.此页面有一些参考.

This sequence of instruction is attributed to Veltkamp and is typically used in "Dekker multiplication". This page has some references.

这篇关于如何将double/float舍入为BINARY精度?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆