浮点转换实际上是如何在 C++ 中完成的?(双浮点或浮点双倍) [英] How is floating point conversion actually done in C++?(double to float or float to double)

查看:46
本文介绍了浮点转换实际上是如何在 C++ 中完成的?(双浮点或浮点双倍)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以我搜索了这个主题,并没有发现任何与它真正相关的内容.

So I've searched about this topic and found nothing really relevant about it.

我试图查看这个简单代码背后的程序集:

I've tried to look at the assembly behind this simple code :

int main(int argc, char *argv[])
{
    double d = 1.0;
    float f = static_cast<float>(d);

    system("PAUSE");
    return 0;
}

这是(使用 Visual Studio 2012):

which is (with Visual Studio 2012) :

    15:     double d = 1.0;
000000013FD7C16D  movsd       xmm0,mmword ptr [__real@3ff0000000000000 (013FD91AB0h)]  
000000013FD7C175  movsd       mmword ptr [d],xmm0  
    16:     float f = static_cast<float>(d);
000000013FD7C17B  cvtsd2ss    xmm0,mmword ptr [d]  
000000013FD7C181  movss       dword ptr [f],xmm0

我对组装不太满意,但无论如何我都试图分析它.所以前两行好像是把双精度值3ff0000000000000移动到一个寄存器中,然后把寄存器的内容移动到d的内存地址中.

I'm not that comfortable with assembly but tried to analyze that anyway. So the first two lines seems to move the double-precision value 3ff0000000000000 into a register, and then move the content of the register to the memory adress of d.

然后,我只是不知道下一行是什么.cvtsd2ss 操作显然是一个转换双精度浮点值的指令到单精度浮点值,但我找不到这条指令实际上做了什么.(然后将转换后的值移动到 f 的内存空间中).

Then, I just don't know exactly what does the next lines. The cvtsd2ss operation is apparently an instruction that convert double precision floating point value to single precision floating point value but I couldn't find what this instruction actually does. (Then the converted value is moved to the memory space of f).

所以我的问题是,这个转换实际上是如何通过这条指令完成的?我知道 C++ 强制转换会在另一种类型中产生最接近的值,但除此之外,我不知道实际执行的操作......

So my question is, how is this conversion actually done by this instruction ? I know that the C++ cast will yield the closest value in the other type but apart from that, I have no idea about the actual operations performed...

推荐答案

cvtsd2ss 指令使用FPU 的舍入模式进行转换.默认舍入模式是舍入到最近的偶数.

The cvtsd2ss instruction uses the FPU's rounding mode to do the conversion. The default rounding mode is round-to-nearest-even.

为了遵循该算法,请牢记 IEEE 754-1985 维基百科页面上的信息

In order to follow the algorithm, it helps to keep in mind the information at the IEEE 754-1985 Wikipedia page, especially the diagrams representing the layout.

首先计算目标float的指数:double类型的范围比float要大,所以结果可能是0.0f(或非正规)表示非常小的 double,或无限值表示非常大的 double.

First, the exponent of the target float is computed: the double type has a wider range than float, so the result may be 0.0f (or a denormal) for a very small double, or an infinite value for a very large double.

对于普通 double 被转换为普通 float 的通常情况(粗略地说,当 double 的无偏指数可以是以单精度表示的 8 位表示),目标有效数字的前 23 位与原始数字的 52 位有效数字的最高有效位相同.

For the usual case of a normal double being converted to a normal float (roughly, when the unbiased exponent of the double can be represented in the 8 bits of a single-precision representation), the first 23 bits of the destination significand start out the same as the most significant of the original number's 52-bit significand.

然后就是四舍五入的问题:

Then there is the problem of rounding:

  • 如果剩余位低于 10..0,则目标有效位保持原样.

  • if the left-over bits are below 10..0, then the target significand is left as-is.

如果剩余位高于10..0,则目标有效位递增.如果增加它会使其溢出(因为它已经是 1..1),那么进位会传播到指数位.由于精心设计了 IEEE 754 布局,这会产生正确的结果.

If the left-over bits are above 10..0, then the target significand is incremented. If incrementing it makes it overflow (because it is already 1..1), then the carry is propagated into the exponent bits. This produces the correct result because of the careful way the IEEE 754 layout has been designed.

如果剩下的位正好是 10..0,那么 double 正好在两个 float 之间.在这两个选项中,选择最后一位 0(偶数")的那个.

If the bits left over are exactly 10..0, then the double is exactly midway between two floats. Of these two choices, the one with the last bit 0 ("even") is chosen.

经过这一步,目标有效位对应于最接近原始doublefloat.

After this step, the target significand corresponds to the float nearest to the original double.

定向舍入模式只是更简单.目标 float 是非正规的情况稍微复杂一些(必须小心避免双舍入").

The directed rounding modes are only simpler. The case where the target float is a denormal is slightly more complicated (one must be careful to avoid "double-rounding").

这篇关于浮点转换实际上是如何在 C++ 中完成的?(双浮点或浮点双倍)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆