将双精度符号,指数和尾数分开 [英] Separate a double into it's sign, exponent and mantissa

查看:196
本文介绍了将双精度符号,指数和尾数分开的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我阅读了一些已经将双打分解并组合在一起的主题,但是我试图将其分解为它的基本组成部分。到目前为止,我已经确定了一点:

I've read a few topics that do already broken down doubles and "puts it together" but I am trying to break it into it's base components. So far I have the bit nailed down:

breakDouble( double d ){

    long L = *(long*) &d;

    sign;
    long mask = 0x8000000000000000L;

    if( (L & mask) == mask ){

        sign = 1;

    } else {

        fps.sign = 0;
    }
    ...
}

但是我关于如何获得指数和尾数感到很困惑。我勉强将双打加长,因为只有前导位很重要,因此截断不起作用。但是,对于其他部分,我认为这不起作用,并且我知道您不能对浮点数进行按位运算符,因此会卡住。

But I'm pretty stumped as to how to get the exponent and the mantissa. I got away with forcing the double into a long because only the leading bit mattered so truncation didn't play a role. However, with the other parts I don't think that will work and I know you can't do bitwise operators on floats so I'm stuck.

有什么想法?

编辑:当然,只要我发布此内容,我就会发现这,但我不确定此中的浮点数和双精度数有何不同

edit: of course as soon as I post this I find this, but I'm not sure how different floats and doubles are in this case.

编辑2(抱歉,我正在努力):我读到了我在编辑1中链接的帖子,在我看来,我可以用相同的方式在其double上执行他们正在执行的操作,指数的掩码为:

Edit 2(sorry working as I go): I read that post I linked in edit 1 and it seems to me that I can perform the operations they are doing on my double the same way, with masks for the exponent being:

mask = 0x7FF0000000000000L;

以及尾数:

mask = 0xFFFFFFFFFFFFFL;

这是否正确?

推荐答案

在第二次编辑中发布的位掩码看起来正确。但是,您应该注意:

The bit masks you posted in your second edit look right. However, you should be aware that:


  1. 取消引用(long *)& mydouble 违反了C的别名规则。如果您传递gcc的 -fno-strict-aliasing 之类的标志,这在大多数编译器中仍然可行,但如果不这样做,则可能导致问题。您可以强制转换为 char * 并以这种方式查看这些位。这很烦人,您必须担心字节序,但是您不必冒编译器将所有内容搞砸的风险。您还可以像在帖子底部那样创建一个联合类型,并在读取其他三个类型的同时写入 d 成员。

  1. Dereferencing (long *)&mydouble as you do is a violation of C's aliasing rules. This still flies under most compilers if you pass a flag like gcc's -fno-strict-aliasing, but it can lead to problems if you don't. You can cast to char * and look at the bits that way. It's more annoying and you have to worry about endianness, but you don't run the risk of compilers screwing everything up. You can also create a union type like the one at the bottom of the post and write into the d member while reading from the other three.

较小的可移植性注释: long 的大小并不都相同;也许尝试使用 uint64_t 代替? ( double 也不是,但是很明显,这仅适用于IEEE double s。)

Minor portability note: long isn't the same size everywhere; maybe try using a uint64_t instead? (double isn't either, but it's fairly clear that this is intended to apply only to IEEE doubles.)

带位掩码的欺骗方法仅适用于所谓的正常浮点数-那些指数均不为零的数字(

The trickery with bit-masks only works for so-called "normal" floating-point numbers --- those with a biased exponent that is neither zero (indicating subnormal) or 2047 (indicating infinity or NaN).

正如Raymond Chen指出的那样, frexp 函数可以实现您实际想要的功能。 frexp 以有据可查的方式处理次正规,无穷大和NaN情况,但是使用它会给您带来一定的打击。

As Raymond Chen points out, the frexp function does what you actually probably want. frexp handles the subnormal, infinity, and NaN cases in a documented and sane way, but you pay a speed hit for using it.

(显然,列表和代码块之间需要一些非列表文本。在这里;吃完了,降价!)

(Apparently there needs to be some non-list text between a list and a code block. Here it is; eat it up, markdown!)

union doublebits {
  double d;
  struct {
    unsigned long long mant : 52;
    unsigned int expo : 11;
    unsigned int sign : 1;
  };
};

这篇关于将双精度符号,指数和尾数分开的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆