添加32位浮点数。 [英] Adding 32 bit floating point numbers.

查看：216 发布时间：2017/12/21 21:58:45 floating-point 32-bit floating-point-precision

本文介绍了添加32位浮点数。的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在学习更多，然后我想知道浮点数。

可以说我需要补充：

<1> 10000000 00000000000000000000000

1 01111000 11111000000000000000000

2的补码形式。

第一位是符号，接下来的8位是指数，最后的23位是mantisa。

如果不进行科学记数法转换，我该如何添加这两个数字？你能一步一步地走过去吗？

这个东西有什么好资源？视频和练习的例子会很棒。

解决方案

您必须缩放数字，以使它们具有相同的指数。然后，你添加尾数字段，如果有必要，规范化结果。

哦，是的，如果他们是不同的迹象，你只需调用你的减法函数： - ）

让我们用十进制做一个例子，因为它更容易理解。让我们进一步假设他们存储在十进制右侧只有八位数字（数字介于0和1之间）。

添加两个数字：

 符号指数尾数值
 1 42 18453284 + 0.18453284 x 10 ^ 42 
 1 38 17654321 + 0.17654321 x 10 ^ 38

将这些数字缩放到最高指数可以增加尾数字段。：
$ b $ pre $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $' 0.00001765 x 10 ^ 42
= == ========
1 42 18455049 + 0.18455049 x 10 ^ 42

在那里你有你的号码。这也说明了如何由于移位而导致精度的损失。例如，IEEE754单精度浮点数将具有：

  1e38 + 1e-38 = 1e38 
  
 
 
 如： 
 
 
  #include < stdio.h中> 
 int main（void）{
 float f1 = 1e38; 
 float f2 = 1e-38; 
 float f3 = f1 + f2; 
 float f4 = f1  -  f3; 
 printf（％.50f\\\
，f4）; 
返回0; 
 
 $ / code $ / pre 
 
 
 $ b $ p 溢出，这是我提到的正常化的一部分。让我们将 99999.9999 添加到 99999.9993 。由于它们已经有了相同的指数，所以不需要缩放，所以我们只需要添加：
 
 pre $符号指数尾数值
 1 5 99999999 + 0.99999999 x 10 ^ 5 
 1 5 99999993 + 0.99999999 x 10 ^ 5 
 = == ======== 
 1 5 199999992 ??? 
  
您可以在这里看到我们有一个进位的情况，所以我们不能把这个进位，限于八位数字。那么我们要做的就是把数字转移到右边，这样我们就可以插入进位。由于这种转变实际上是十分之一，所以我们必须增加指数来反击这一点。 
 
 所以：
 
 
 符号指数尾数值
 1 5 199999992 ??? 
  
变成： 
 
 
 符号指数尾数值
 1 6 19999999 + 0.19999999 x 10 ^ 6 
  
事实上，这不仅仅是一个简单的右移，因为你需要四舍五入到最接近的数字。如果您要移出的号码是五位或更多，则需要在左边的数字上添加一位。这就是为什么我选择 99999.9993 作为第二个数字。如果我自己添加了 99999.9999 ，我最终会得到： 
 
 
 符号指数尾数值
 1 5 199999998 ??? 
  
在右移的情况下，会触发相当多的走向左侧：
 符号指数尾数值
 1 6 20000000 + 0.2 x 10 ^ 6 
  pre> 
I'm learning more then I ever wanted to know about Floating point numbers. 

Lets say I needed to add: 

1 10000000 00000000000000000000000

1 01111000 11111000000000000000000

2’s complement form.

The first bit is the sign, the next 8 bits are the exponent and the last 23 bits are the mantisa. 

Without doing a conversion to scientific notation, how do I add these two numbers? Can you walk through it step by step? 

any good resources for this stuff? Videos and practice examples would be great. 
 解决方案 
You have to scale the numbers so that they have the same exponent. Then you add the mantissa fields and, if necessary, normalise the result.

Oh, yes, and if they're different signs, you just call your subtraction function instead :-)

Let's do an example in decimal since it's easier to understand. Let's further assume they're stored with only eight digits to the right of the decimal (and the numbers are between 0 inclusive and 1 exclusive).

Add the two numbers:
sign  exponent  mantissa  value
   1        42  18453284  + 0.18453284 x 10^42
   1        38  17654321  + 0.17654321 x 10^38
Scaling these numbers to the highest exponent gives something where you can add the mantissa fields.:
sign  exponent  mantissa  value
   1        42  18453284  + 0.18453284 x 10^42
   1        42      1765  + 0.00001765 x 10^42
   =        ==  ========
   1        42  18455049  + 0.18455049 x 10^42
And there you have your number. This also illustrates how accuracy can be lost due to the shifting. For example, IEEE754 single precision floats will have:
1e38 + 1e-38 = 1e38
such as with:
#include <stdio.h>
int main (void) {
    float f1 = 1e38;
    float f2 = 1e-38;
    float f3 = f1 + f2;
    float f4 = f1 - f3;
    printf ("%.50f\n", f4);
    return 0;
}




In terms of what happens with overflow, that's part of the normalisation I mentioned. Let's add 99999.9999 to 99999.9993. Since they already have the same exponent, no need to scale, so we just add:
sign  exponent  mantissa  value
   1         5  99999999  + 0.99999999 x 10^5
   1         5  99999993  + 0.99999999 x 10^5
   =        ==  ========
   1         5 199999992  ???
You can see here that we have a carry situation so we can't put that carry into the number, being limited to eight digits. What we do then is to shift the number to the right so that we can insert the carry. Since that shift is effectively a divide-by-ten, we have to increment the exponent to counter that.

So:
sign  exponent  mantissa  value
   1         5 199999992  ???
becomes:
sign  exponent  mantissa  value
   1         6  19999999  + 0.19999999 x 10^6
In reality, it's not just a simple right-shift since you need to round to the nearest number. If the number you're shifting out is five or more, you need to add one to the digit on the left. That's why I chose 99999.9993 as the second number. If I had added 99999.9999 to itself, I would have ended up with:
sign  exponent  mantissa  value
   1         5 199999998  ???
which, on right shift, would have triggered quite a few carries towards the left:
sign  exponent  mantissa  value
   1         6  20000000  + 0.2 x 10^6


                        
这篇关于添加32位浮点数。的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

添加32位浮点数。 [英] Adding 32 bit floating point numbers.

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

添加32位浮点数。 [英] Adding 32 bit floating point numbers.

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭