双精度浮点数如何转换为单精度浮点格式? [英] How are double-precision floating-point numbers converted to single-precision floating-point format?

查看:708
本文介绍了双精度浮点数如何转换为单精度浮点格式?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

转换数字双精度浮点格式转换为单精度浮点格式导致精度损失。什么是用来实现这种转换的算法?

大于 3.4028234e + 38 或小于 -3.4028234e + 38 简单地减少到相应的限制?我觉得转换过程比这个要复杂一些,但是我找不到它的文档。

常见的浮点格式是IEEE 754标准中规定的二进制浮点格式。我会回答你的问题这些格式。在标准的新版本(2008)中也有十进制浮点格式,除了IEEE 754标准外,还有其他格式,但754二进制格式是最常见的格式。一些关于四舍五入的信息以及这个标准的链接在这个维基百科页面。 p>

将双精度转换为单精度的处理方式与舍去任何操作的结果相同。 (例如,加法,乘法或平方根具有精确的数学值,并且该值根据规则进行四舍五入以产生从操作返回的结果。为了转换,输入值是精确的数学值,并且它是四舍五入的。)



通常,计算环境有一些默认舍入模式。 (各种编程语言可能会提供更改默认舍入模式的方法,或者特别针对每个操作来指定它。)默认舍入模式通常是舍入到最近的。其他人则朝向零,向正向无穷大(向上),向负向无限向下(向下)。

最近的模式下,返回最接近确切值的可表示数字。如果存在平局,则返回偶数低位(在其小数或有效数中)的数字。为了这个目的,无穷有效地就好像它是有限数的模式中的下一个值一样。在单精度中,最大的有限数是0x1.fffff8p127,0x1.fffffap127,0x1.fffffcp127和0x1.fffffep127。 (单精度有效位有24个位,所以该位最后一个十六进制数中的一个步是2)。为了舍入目的,无穷大就好像它在0x2p128一样。所以,如果确切的结果更接近0x1.fffffep127(因此,小于0x1.ffffffp127),它被舍入到0x1.fffffep127。如果它大于或等于0x1.ffffffp127,则舍入到无穷大。负无穷的情况是对称的。

在圆到无穷模式下,返回大于或等于精确值的最接近的可表示值。所以任何高于0x1.fffffep127的值都会变成无穷大。向负无穷大回合返回小于或等于确切值的最接近的可表示值。 Round-toward-zero返回最接近零值的方向。


IEEE 754标准只规定结果。它没有指定算法。用于实现舍入的方法取决于每个实现。


Converting numbers from double-precision floating-point format to single-precision floating-point format results in loss of precision. What's the algorithm used to achieve this conversion?

Are numbers greater than 3.4028234e+38 or lesser than -3.4028234e+38 simply reduced to the respective limits? I feel that the conversion process is a bit more involved than this but I couldn't find documentation for it.

解决方案

The most common floating-point formats are the binary floating-point formats specified in the IEEE 754 standard. I will answer your question for these formats. There are also decimal floating-point formats in the new (2008) version of the standard, and there are formats other than the IEEE 754 standard, but the 754 binary formats are by far the most common. Some information about rounding, and links to the standard, are in this Wikipedia page.

Converting double precision to single precision is treated the same as rounding the result of any operation. (E.g., an addition, multiplication, or square root has an exact mathematical value, and that value is rounded according to the rules to produce the result returned from the operation. For purposes of conversion, the input value is the exact mathematical value, and it is rounded.)

Generally, the computing environment has some default rounding mode. (Various programming languages may provide ways to change the default rounding mode or to specify it particularly with each operation.) The default rounding mode is commonly round-to-nearest. Others are round-toward-zero, round-toward-positive-infinity (upward), and round-toward-negative-infinity (downward).

In round-to-nearest mode, the representable number nearest the exact value is returned. If there is a tie, then the number with the even low bit (in its fraction or significand) is returned. For this purpose, infinity effectively acts as if it were the next value in the pattern of finite numbers. In single-precision, the greatest finite numbers are 0x1.fffff8p127, 0x1.fffffap127, 0x1.fffffcp127, and 0x1.fffffep127. (There are 24 bits in the single-precision significand, so a step in that bit is a step of 2 in that last hexadecimal digit.) For rounding purposes, infinity acts as if it were at 0x2p128. So, if the exact result is closer to 0x1.fffffep127 (thus, less than 0x1.ffffffp127), it is rounded to 0x1.fffffep127. If it is greater than or equal to 0x1.ffffffp127, it is rounded to infinity. The situation for negative infinity is symmetric.

In round-toward-infinity mode, the nearest representable value that is greater than or equal to the exact value is returned. So any value above 0x1.fffffep127 rounds to infinity. Round-toward-negative-infinity returns the nearest representable value that is less than or equal to the exact vaue. Round-toward-zero returns the nearest representable value in the direction toward zero.

The IEEE 754 standard only specifies the result; it does not specify the algorithm. The method used to achieve the rounding is up to each implementation.

这篇关于双精度浮点数如何转换为单精度浮点格式?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆