将浮点(IEEE-754)打包到uint64_t中的代码的规范化部分 [英] Normalization part of a code of Packing a Float (IEEE-754) into uint64_t

查看：127 发布时间：2020/11/8 22:34:23 c floating-point ieee-754

本文介绍了将浮点(IEEE-754)打包到uint64_t中的代码的规范化部分的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我一直在研究以二进制格式(以uint64_t格式)存储float的可移植方式，以便可以通过网络将其共享给各种微控制器.它应独立于系统的float的memory layout和endianness.

我遇到了这个答案.但是，我无法理解下面显示的代码中的几行:

while(fnorm >= 2.0) { fnorm /= 2.0; shift++; }
while(fnorm < 1.0) { fnorm *= 2.0; shift--; }
fnorm = fnorm - 1.0;

// calculate the binary form (non-float) of the significand data
significand = fnorm * ((1LL<<significandbits) + 0.5f);

我知道上面的代码试图将significand标准化.上面的代码片段的第一行试图获取浮点数的exponent.我不确定为什么第二，第三和第四行是必要的.我能够理解，第二和第三行代码试图使fnorm变量位于0.0和1.0之间，但是为什么要这么做呢?在0.0和1.0之间使用fnorm(以十进制格式)是否确保其二进制表示形式为1.xxxxxx....

请帮助我了解每个步骤正在尝试实现什么以及如何实现该目标?我想了解它如何更改float变量的位模式以使其标准化为有效的(最左侧的位设置为1).

while循环调整指数，以便将fnorm的第一个二进制1放置在点之前(在基数2中).
所以在基数2中最多fnorm是1.1111111 ...，在基数10中几乎是2.0.
在基数2中至少fnorm是1.000000 ...在基数10中至少是1.0.

在IEEE754中，规范化数字的有效位数(不是次正规)的形式为1.xxxxxx ...(以2为底)，与先前的循环一致.
点之前的第一位始终为1，这就是为什么不必记住它的原因. (也许这是您的问题的重点)

归一化后，您的算法减去1.0，如您所见，结果为0.xxxxx....
只要我们记得此减法是系统的，减法1.0不会丢失任何信息.
将此浮点值(严格小于1.0，但不为负)乘以整数1LL<<significandbits得出的浮点数严格小于此大整数.
因此，将其转换为整数将得到一个不会溢出有效位的值.
(我想0.5增量有助于舍入最后一位)

此整数包含最初在浮点值的有效位中的所有有效位.
知道了这一点，移位和符号后，便可以重新构造原始的浮点值.

但是，正如评论中所建议的那样，由于IEEE754位模式已得到很好的定义，因此所有这些可能都是不必要的.

I have been researching about portable way to store a float in a binary format (in uint64_t), so that it can be shared over network to various microcontroller. It should be independent of float's memory layout and endianness of the system.

I came across this answer. However, I am unable to understand few lines in the code which are shown below:

while(fnorm >= 2.0) { fnorm /= 2.0; shift++; }
while(fnorm < 1.0) { fnorm *= 2.0; shift--; }
fnorm = fnorm - 1.0;

// calculate the binary form (non-float) of the significand data
significand = fnorm * ((1LL<<significandbits) + 0.5f);

I am aware that the code above tries to normalize the significand. The first line in the above code fragment is trying to get the exponent of the float. I am not sure why second, third and fourth line are necessary. I am able to understand that second and third line of code tries to make fnorm variable lie between 0.0 and 1.0 but why it is necesarry? Does having fnorm (in decimal format) between 0.0 and 1.0 makes sure it's binary representation will be 1.xxxxxx... .

Please help me understanding what each step is trying to achieve what and how it achieves that? I want to understand how it changes bit-pattern of the float variable to get normalized significant (left-most bit set to 1).

解决方案

The while loops adjust the exponent in order to place the first binary 1 of fnorm just before the dot (in base 2).
So at most fnorm is 1.1111111... in base 2, which is almost 2.0 in base 10.
At least fnorm is 1.000000... in base 2, which is 1.0 in base 10.

In IEEE754, the significand of a normalised number (not subnormal) has the form 1.xxxxxx... (base 2), which conforms to the previous loops.
The first bit, before the dot, is always 1 that's why it is not necessary to memorize it.
(may be this last remark is the main point of your question)

After normalisation, your algorithm substracts 1.0, which leads to 0.xxxxx... as you saw.
Substracting 1.0 does not lose any information as long as we remember this substraction is systematic.
Multiplying this float value (strictly less than 1.0, but not negative) by the integer 1LL<<significandbits gives a float which is strictly less than this big integer.
Thus, converting it into an integer will give a value that does not overflow the significant bits.
(I guess the 0.5 increment helps rounding the last bit)

This integer contains all the significant bits that were originally in the significand of the floating point value.
Knowing it, the shift, and the sign makes possible the reconstitution of the original floating point value.

But, as suggested in the comments, since IEEE754 bit pattern is well defined, all of this may not be necessary.

这篇关于将浮点(IEEE-754)打包到uint64_t中的代码的规范化部分的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

将浮点(IEEE-754)打包到uint64_t中的代码的规范化部分 [英] Normalization part of a code of Packing a Float (IEEE-754) into uint64_t

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

将浮点(IEEE-754)打包到uint64_t中的代码的规范化部分 [英] Normalization part of a code of Packing a Float (IEEE-754) into uint64_t

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭