如何确定IEEE-754单精度和双精度格式? [英] How are IEEE-754 single and double precision formats determined?

查看：241 发布时间：2020/11/26 4:18:25 precision ieee-754 design-decisions design-rationale

本文介绍了如何确定IEEE-754单精度和双精度格式?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我对这些是如何确定的感兴趣:

I'm interested in how these are determined:

单精度具有:e的8位和余数(23位)是尾数
双精度:e和rest的11位(52位)为尾数 ofc中有1位用于签名.

Single precision has: 8 bits for e and rest (23 bits) are mantissa
Double precision: 11 bits for e and rest (52 bits) are mantissa ofc there is 1 bit for sign.

那么如何确定尾数是多少位数，e是多少位数.我想这是一个菜鸟问题，但我想知道答案.

So how is it determined what number of bits is for mantissa, and what number of bits is for e. I guess this is noob question, but I would like to know the answer.

推荐答案

如果您自己开发格式，则可以根据需要的精度或更大的范围来确定指数和尾数的位数.由于IEEE-754是为通用设计的，因此他们必须选择在大多数情况下更好的选择

If you develop a format for your own then you can decide how many bits for the exponent and mantissa depending on that you need more precision or a larger range. Since IEEE-754 is designed for general use, they must choose what's better in most situations

在IEEE-754之前，存在许多具有不同优缺点的浮点格式，其中一些来自DEC.最初，DEC为他们的VAX系统创建了32位F和64位D格式，为了表示所有重要的物理常数，包括Plank常数(6.626070040×10 ^-34和Avogadro常数(6.022140857×10 ²³).但是他们很快意识到数量非常有限，并且不时发生上溢/下溢，因此他们必须在指数上再增加3位以创建新的64位G格式.当Kahan博士撰写IEEE-754草案时，他 "建议之所以要复制DEC VAX的浮点是因为它对它的时间非常有用". ，这就是为什么IEEE-754单精度和双精度在指数部分分别具有8位和11位的原因.

Before IEEE-754 there were lots of floating-point formats with different pros and cons, some of those are from DEC's. Initially DEC created the 32-bit F and 64-bit D formats for their VAX system, both have 8 bits for the exponent in order to represent all important physical constants, including the Plank constant (6.626070040 × 10^-34) and the Avogadro constant (6.022140857 × 10²³). But they quickly realized that the number is quite limited and overflow/underflow happen every now and then so they have to add 3 more bits to the exponent to create a new 64-bit G format. When Dr. Kahan wrote the IEEE-754 draft he "suggested that DEC VAX's floating-point be copied because it was very good for its time" and that's why IEEE-754 single and double precision have 8 and 11 bits in the exponent part respectively

64位格式的另一个基本原理是允许重复乘法而不会溢出

Another rationale for the 64-bit format is to allow repeated multiplication without overflow

对于64位格式，主要考虑因素是范围；最低要求是，任何两个32位数字的乘积都不应溢出64位格式.指数范围的最终选择是，八个32位项的乘积不会溢出64位格式，这对优化编译器的用户来说可能是福音，这些编译器会按照精心设计的程序员指定的顺序对算术运算的顺序进行重新排序.

For the 64-bit format, the main consideration was range; as a minimum, the desire was that the product of any two 32-bit numbers should not overflow the 64-bit format. The final choice of exponent range provides that a product of eight 32-bit terms cannot overflow the 64-bit format — a possible boon to users of optimizing compilers which reorder the sequence of arithmetic operations from that specified by the careful programmer.

二进制浮点算术的拟议标准" ；，David Stephenson，IEEE计算机，第1卷. 1981年3月，第14卷，第3期，第51-62页

"A Proposed Standard for Binary Floating-Point Arithmetic", David Stephenson, IEEE Computer, Vol. 14, No. 3, March 1981, pp. 51-62

相同的原因是，各种DSP具有更宽的累加器寄存器，通常为40位以允许将32位值相加256次而不会溢出

It's the same reason that various DSPs have a wider accumulator register, usually 40-bit to allow adding 32-bit values 256 times without overflow

事实上，如今，IEEE-754交换格式的规则是指数大小为 round(4 log ₂(k))-13 位，因此每次类型宽度加倍时，指数将增加约4位，从而允许16种较窄类型的乘法而不会溢出

In fact nowadays the rule for IEEE-754 interchange format the size for the exponent is round(4 log₂(k)) − 13 bits so every time we double the width of the type, the exponent will be have ~4 more bits which allows for 16 multiplications of the narrower type without overflow

在16位半浮点格式中，由于范围太窄，并且如果仅对指数使用4位，则最大值甚至比最大16位int值小得多，因此它们改用5位.半浮点数主要用于计算机图形学，因此11位的精度可能就足够了，它们需要更大的指数才能实现更宽的动态范围.

In the 16-bit half-float format, as the range would be too narrow and the maximum value is even much smaller than the maximum 16-bit int value if using only 4 bits for the exponent, they use 5 bits instead. Half-floats are mainly used in computer graphics so probably the precision of 11 bits is enough, and they need bigger exponent for wider dynamic range.

有关更多详细信息，请阅读 IEEE 754的免费参数来自何处?

For more details read Where did the free parameters of IEEE 754 come from?

这篇关于如何确定IEEE-754单精度和双精度格式?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何确定IEEE-754单精度和双精度格式? [英] How are IEEE-754 single and double precision formats determined?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何确定IEEE-754单精度和双精度格式? [英] How are IEEE-754 single and double precision formats determined?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭