一个快速的方法来圆一个双到32位int解释 [英] A fast method to round a double to a 32-bit int explained

查看:501
本文介绍了一个快速的方法来圆一个双到32位int解释的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在阅读 Lua的源$ C ​​$ C,我注意到,Lua使用一个来圆一个双击来的32位 INT 。我提取,它看起来是这样的:

 工会i_cast {双D; INT I [2]};
#定义double2int(I,D,T)\\
    {挥发性工会i_cast U; u.d =(D)+ 6755399441055744.0; \\
    (ⅰ)=(t)的u.i [ENDIANLOC];}

下面 ENDIANLOC 被定义为字节顺序 0 为小尾数, 1 为大端。 Lua的精心处理的字节顺序。 T 为整数类型,如 INT unsigned int类型

我做了一些研究,有使用相同思想的简单的格式:

 的#define double2int(I,D)\\
    {双T =((D)+ 6755399441055744.0); I = *((INT *)(& T公司));}

另外,在一个C ++ - 风格:

 内嵌INT double2int(双D)
{
    D + = 6755399441055744.0;
    返回reinter pret_cast< INT和放大器;>(四)
}

这技巧可以使用 IEEE&NBSP任何机器上工作; 754 (即pretty多少每一台机器的今天) 。它的工作原理为正数和负数,并四舍五入如下银行家规则。 (这是不令人惊奇的,因为它遵循IEEE  754)

我写了一个小程序来测试吧:

  INT的main()
{
    双D = -12345678.9;
    INT I;
    double2int(I,D)
    的printf(%d个\\ N,I);
    返回0;
}

和它输出-12345679,符合市场预期。

我想进入细节如何棘手的作品。幻数 6755399441055744.0 其实就是 2 ^ 51 + 2 ^ 52 1.5 * 2 ^二进制52 1.5 可重新psented为 $ p $ 1.1 。当任何32位整数将被添加到这个神奇的数字,好了,我从这里消失。这是如何工作的把戏?

PS:这是在Lua源$ C ​​$ C, Llimits.h <。 / p>

更新


  1. 作为@Mysticial指出,这种方法并不只局限于一个32位的 INT
    它也可以扩展到64位 INT 只要数在
    的2 ^ 52范围内。 (在需要做一些修改。)

  2. 有些材料说,这种方法不能在的Direct3D 使用。

  3. 在与微软的汇编程序的工作86,有一种更
    写在组装(这也从Lua源中提取):

     的#define double2int(I,N){__asm​​ __asm​​ FLDñ__asm​​ FISTP我}


  4. 有一个类似的幻数为单precision号: 1.5 * 2 ^ 23



解决方案

A 双击重新presented是这样的:

和它可以被看作是两个32位的整数;目前, INT 在code的所有版本所(假设它是一个32位的 INT )上图中的右边的,所以你在做什么,到底是只是把尾数的最低32位。


现在,到了神奇的数字;当你正确地指出,6755399441055744是2 ^ 51 + 2 ^ 52;加入这样的一些势力的双击进入甜蜜系列2 ^ 52和2 ^ 53,正如维基百科<解释之间的href=\"https://en.wikipedia.org/wiki/Double_$p$pcision_floating-point_format#IEEE_754_double-$p$pcision_binary_floating-point_format%3a_binary64\">here,有一个有趣的属性:


  

2之间 52 = 4,503,599,627,370,496和2 53 = 9,007,199,254,740,992重新presentable数字是准确整数


这从如下事实,即尾数为宽52位。

有关添加2的另一个有趣的事实 51 +2 52 的是,它影响尾数只有在最高的两位 - 这是无论如何丢弃,因为我们正在采取只有最低的32位。


最后但并非最不重要:符号

IEEE 754浮点采用了大小和符号重新presentation,而在正常的机器整数使用2的补码运算;这是怎么在这里处理?

我们只谈过关于正整数;现在假设我们用一个32位的 INT 处理范围内的重新presentable负数,所以比(-2 ^ 31以下(绝对值) +1);称之为 -a 。这样的数字是通过将幻数明显取得了积极,将所得的值是2 52 +2 51 +( - a)中。

现在,我们怎么得到,如果我们间preT 2的补再presentation尾数?它必须是2的补总和的结果(2 52 +2 51 )和(-A)。再次,第一项只影响的高两位,剩下的位0〜50是2的补重新$ P $的psentation(-a)(再次,减去上面的两个位)。

由于减少一个2的补数的到一个较小的宽度由切去多余位的左,服用低32位给了我们正确地(-a)在32位,2的补码算术刚刚完成。

When reading Lua's source code, I noticed that Lua uses a macro to round a double to a 32-bit int. I extracted the macro, and it looks like this:

union i_cast {double d; int i[2]};
#define double2int(i, d, t)  \
    {volatile union i_cast u; u.d = (d) + 6755399441055744.0; \
    (i) = (t)u.i[ENDIANLOC];}

Here ENDIANLOC is defined as endianness, 0 for little endian, 1 for big endian. Lua carefully handles endianness. t stands for the integer type, like int or unsigned int.

I did a little research and there's a simpler format of macro that uses the same thought:

#define double2int(i, d) \
    {double t = ((d) + 6755399441055744.0); i = *((int *)(&t));}

Or in a C++-style:

inline int double2int(double d)
{
    d += 6755399441055744.0;
    return reinterpret_cast<int&>(d);
}

This trick can work on any machine using IEEE 754 (which means pretty much every machine today). It works for both positive and negative numbers, and the rounding follows Banker's Rule. (This is not suprising, since it follows IEEE 754.)

I wrote a little program to test it:

int main()
{
    double d = -12345678.9;
    int i;
    double2int(i, d)
    printf("%d\n", i);
    return 0;
}

And it outputs -12345679, as expected.

I would like to get into detail how this tricky macro works. The magic number 6755399441055744.0 is actually 2^51 + 2^52, or 1.5 * 2^52, and 1.5 in binary can be represented as 1.1. When any 32-bit integer is added to this magic number, well, I'm lost from here. How does this trick work?

P.S: This is in Lua source code, Llimits.h.

UPDATE:

  1. As @Mysticial points out, this method doesn't limit itself to a 32-bit int, it can also be expanded to a 64-bit int as long as the number is in the range of 2^52. (The macro needs some modification.)
  2. Some materials say this method can't be used in Direct3D.
  3. When working with Microsoft assembler for x86, there's an even faster macro written in assembly (this is also extracted from Lua source):

    #define double2int(i,n)  __asm {__asm fld n   __asm fistp i}
    

  4. There is a similar magic number for single precision number: 1.5 * 2 ^23

解决方案

A double is represented like this:

and it can be seen as two 32-bit integers; now, the int taken in all the versions of your code (supposing it's a 32-bit int) is the one on the right in the figure, so what you are doing in the end is just taking the lowest 32 bits of mantissa.


Now, to the magic number; as you correctly stated, 6755399441055744 is 2^51 + 2^52; adding such a number forces the double to go into the "sweet range" between 2^52 and 2^53, which, as explained by Wikipedia here, has an interesting property:

Between 252=4,503,599,627,370,496 and 253=9,007,199,254,740,992 the representable numbers are exactly the integers

This follows from the fact that the mantissa is 52 bits wide.

The other interesting fact about adding 251+252 is that it affects the mantissa only in the two highest bits - which are discarded anyway, since we are taking only its lowest 32 bits.


Last but not least: the sign.

IEEE 754 floating point uses a magnitude and sign representation, while integers on "normal" machines use 2's complement arithmetic; how is this handled here?

We talked only about positive integers; now suppose we are dealing with a negative number in the range representable by a 32-bit int, so less (in absolute value) than (-2^31+1); call it -a. Such a number is obviously made positive by adding the magic number, and the resulting value is 252+251+(-a).

Now, what do we get if we interpret the mantissa in 2's complement representation? It must be the result of 2's complement sum of (252+251) and (-a). Again, the first term affects only the upper two bits, what remains in the bits 0~50 is the 2's complement representation of (-a) (again, minus the upper two bits).

Since reduction of a 2's complement number to a smaller width is done just by cutting away the extra bits on the left, taking the lower 32 bits gives us correctly (-a) in 32 bit, 2's complement arithmetic.

这篇关于一个快速的方法来圆一个双到32位int解释的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆