C ++中二进制格式的浮点数乘法 [英] Multiplication of Floating point numbers in Binary format in C++

查看：282 发布时间：2019/6/17 11:35:50 C++ binary

本文介绍了C ++中二进制格式的浮点数乘法的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我需要编写一个代码，它以二进制格式乘以两个浮点数。当我给出整数输入时，代码工作正常，但是没有为浮点输入提供正确的结果。

当我输入例如：

3.2和12.2

我收到了这个答案：

1010011100010000

但我必须得到这个

1001110000101000

这是我的链接代码：

http://ideone.com/q8ned7 [ ^ ]

请帮助我。这是我考试的问题

I am required to write a code which multiplies two floating point numbers in binary format. The code works fine when I give an integer input, but doesn''t give correct result for floating point input.

When I give an input for example:

3.2 and 12.2

I am getting this answer:

1010011100010000

but I must get this

1001110000101000

Here is the link to my code:

http://ideone.com/q8ned7[^]

Kindly help me. Its a matter of my exams

推荐答案

在我对你的问题的第一个解决方案中，我把你的问题视为字面意思：你要求浮点解决方案。

现在阅读您的预期结果表明您实际上在谈论定点问题。

固定点方法将积分部分和分数部分之间的间隔置于二进制格式的固定位置。

Eg 32位格式，22位整数位和10位小数位。

In my first solution to your question I took your question too literal: you ask for a floating point solution.
Reading now your expected result shows that you in fact talk about a fixed point problem.

A fix point approach places the separation between integral part and the fraction part at a fixed position within the binary format.

E.g. with 32 bit format, 22 integral bits and 10 fraction bits.

12.2 = 1100.0011001100
 3.2 =   11.0011001100

在定点运算中，您可以乘以普通的无符号整数，并将结果移回适当的小数位。为了获得更准确的结果，您可以添加两个额外的舍入位，产生10 + 2分数位，其中仅前10位，最后两位用于舍入。

例如32位格式，20位积分位和10 + 2小数位。

In fix-point arithmetic, you can multiply as plain unsigned integer and shift the result by the proper amount of fractional bit back. To get a more accurate result, you may add two additional rounding bits, resulting in 10 + 2 fraction bit, where only the first 10 bits are taken and the last two are used for rounding.

E.g. with 32 bit format, 20 integral bits and 10+2 fraction bits.

12.2 = 1100.0011001100[11]
 3.2 =   11.0011001100[11]

它可以被视为缩放：在这个例子中你有10 + 2个小数位，因此缩放2 ^{（10 + 2 ）} = 4096.

It can be viewed as scaling: in this example you have 10+2 fractional bits, hence, scaled by 2⁽¹⁰⁺²⁾ = 4096.

12.2 x 4096 = 49971.2 --> 49971
 3.2 x 4096 = 13107.2 --> 13107

49971 x 13107 = 654969897

654969897 / 4096 = 159904.76 --> 159904
159904 / 4096 = 39.0392 --> 39.04

159904 --> 100111.0000101000[00]

算法：

1）将十进制数除以4096并存储为整数位模式

2）将积分位模式乘以整数

3）将结果除以4096并将得到的位模式作为固定点数

4）用于打印：留下两个圆角位

例如

Algorithm:
1) scale decimal numbers by 4096 and store as integral bit pattern
2) multiply the integral bit patterns as integer
3) divide result by 4096 and take the resulting bit pattern as fixpoint number
4) for printing: leave the two rounding bits away

E.g.

1)    12.2 -->               1100.0011001100[11]
       3.2 -->                 11.0011001100[11]
2)    mult --> 100111000010100000.[110000101001]
3)   scale -->             100111.0000101000[00]
4)   print --> 39.0390625 --> 39.04

为避免溢出，您当然应合并多个并缩小（上面的步骤2和3）到一个操作时要小心地丢弃较低的10 + 2结果分数位。

干杯

Andi

To avoid overflow, you should of course merge mult and scale back (steps 2 and 3 above) into one operation taking care to throw away the lower 10+2 result fraction bits while multiplying.

Cheers
Andi

我假设你谈的是 IEEE 754 [ ^ ]浮点数。

这样的数字有

sign

尾数

指数（偏移偏移）

一些特殊值表示+ / - 无限，非数字（NaN）

I assume you talk about IEEE 754[^] floating point numbers.
Such a number has

sign
mantissa
exponent (which is shifted by a bias)
some special values that indicate +/- infinity, not-a-number (NaN)

如何将符号，尾数和指数存储在位中 [ ^ ]
规范化 [ ^ ]表示
如何乘以
- r.Sign = a.Sign ^ b.Sign
- r.Mantissa = a.Mantissa * b.Mantissa
- r.Exponent = FLOAT_BIAS +（a .Exponent - FLOAT_BIAS）+（b.Exponent - FLOAT_BIAS）
- r.Normalize（）

how the sign, mantissa, and exponent are stored in bits[^]
what normalizing[^] means
how to multiply
- r.Sign = a.Sign ^ b.Sign
- r.Mantissa = a.Mantissa * b.Mantissa
- r.Exponent = FLOAT_BIAS + (a.Exponent - FLOAT_BIAS) + (b.Exponent - FLOAT_BIAS)
- r.Normalize()

这样在值范围的边界上不够健壮：在规范化之前，您可能已经出现溢出或溢出。但原则仍然是正确的。

出于技术目的，使用联合和位域：

union的一个方面是无符号整数
另一个是表示符号，尾数和指数的位域

const int S_BITS = 1;
const int E_BITS = 8;
const int M_BITS = 23;
union Float
{
   unsigned long raw;
   struct {
       unsigned int mantissa : M_BITS;
       unsigned int exponent : E_BITS;
       unsigned int sign     : S_BITS;
   } bits;
};

此内存布局取决于运行程序的计算机体系结构。例如。请参阅 MSDN：C ++位字段 [ ^ ]

祝你好运！

干杯

Andi

This memory layout depends on the computer architecture on which you run your program. E.g. see MSDN: C++ bit fields[^]

Good luck!

Cheers
Andi

叫我偏执，但我没有关注你的链接（对不起）。

通过浮点，我假设你在谈论xxx中的任意数字位数 .xxx格式而不是存储在float和double值中的浮点值，由机器硬件处理。

你要做的第一件事是打破你的编号为二进制;之后，对结果进行数学计算。小数点左边的数字（位）（sic - 实际上是二进制点）遵循与将整数转换为二进制相同的规则。二进制点右边的位置会使你的头部受到伤害，并且对应于0.5,0.25,0.125 ......等等。

理解前一段应该为你解决问题。

Call me paranoid but I did not follow your link (sorry).

By "floating point", I assume you are talking about an arbitrary number of digits in "xxx.xxx" format and not the floating point values stored in float and double values, which are handled by machine hardware.

The first thing you need to do is break your number down into binary; after that do the math on the result. Digits ("bits") to the left of the decimal point (sic - actually binary point) follow the same rules as for translating integers to binary. Bits to the right of the binary point are what will make your head hurt, and correspond to 0.5, 0.25, 0.125... etc.

Understanding the previous paragraph should solve the problem for you.

这篇关于C ++中二进制格式的浮点数乘法的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

C ++中二进制格式的浮点数乘法 [英] Multiplication of Floating point numbers in Binary format in C++

问题描述

推荐答案

相关文章

其他开发语言最新文章

热门教程

热门工具

登录关闭

C ++中二进制格式的浮点数乘法 [英] Multiplication of Floating point numbers in Binary format in C++

问题描述

推荐答案

相关文章

其他开发语言最新文章

热门教程

热门工具

登录 关闭

登录关闭