如果你投一个大INT浮动会发生什么 [英] what happens if you cast a big int to float

查看:218
本文介绍了如果你投一个大INT浮动会发生什么的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是什么时候我用gcc 4.4转换为浮点非常大/小符号整数什么precisely发生一个一般性的问题。

我看到一些奇怪的行为做铸造时。下面是一些例子:

MUSTBE用该方法获得:

 浮动F =(浮点)X;
无符号整数读;
的memcpy(安培; R&安培; F,的sizeof(无符号整数));./btest -f float_i2f -1 0x80800001
输入:10000000100000000000000000000001
绝对值:01111111011111111111111111111111指数:10011101
尾数:00000000011111101111111111111111(右移绝对值)预计有:11001110111111101111111111111111(符号|指数|尾数)
必须是:11001110111111110000000000000000(签名OK,OK指数,
                                                     尾数???)./btest -f float_i2f -1 0x3f7fffe0预计有:01001110011111011111111111111111
必须是:01001110011111100000000000000000./btest -f float_i2f -1 0x80004999
预计有:11001110111111111111111101101100
必须是:11001110111111111111111101101101(小于 - 1加末)

,使尾数是在不同的话,如果我只是我的整数值向右移两个例子让我困扰。在结束比如零。哪里来的?

我只看到大/小值此行为。在范围值-2 ^ 24,2 ^ 24做工精细。

我不知道是否有人能启发我在这里会发生什么。什么是太承担非常大/小值的步骤。

这是对问题的加载:<一href=\"http://stackoverflow.com/questions/25699585/function-to-convert-float-to-int-huge-integers\">function转换为浮动INT(整数巨大)这是不一般,因为这一个在这里。

编辑
code:

 无符号float_i2f(INT X){
  如果(X == 0)返回0;
  / *获取X *的标志/
  INT符号=(X GT;&GT; 31)及为0x1;  / * X *的绝对值/
  诠释一个等号(=)? 〜X + 1:X;  / *计算指数* /
  INT E = 158;
  诠释T = A;
  而((T&GT;!&GT; 31)及为0x1){
    T&LT;&LT; = 1;
    E--;
  };  / *计算尾数* /
  INT M =(T&GT;&GT; 8)及〜(((为0x1&所述;&下; 31)&GT;→8&所述;&所述; 1));
  M&安培; = 0x7fffff;  中期业绩等号(=)&LT;&LT; 31;
  RES | =(E&LT;&LT; 23);
  RES | =米;  返回水库;
}

编辑2:

亚当斯的言论和参考书籍写大code后,我更新了我的舍入程序。我仍然得到一些舍入误差(现在幸运的是只有1位关闭)。

现在,如果我做了试运行,我得到的大多是不错的结果,但一对夫妇舍入误差是这样的:

 输入:0xfefffff5
结果:11001011100000000000000000000101
目标:11001011100000000000000000000110(1太低)输入:0x7fffff
结果:01001010111111111111111111111111
目标:01001010111111111111111111111110(1太高)无符号float_i2f(INT X){
  如果(X == 0)返回0;
  / *获取X *的标志/
  INT符号=(X GT;&GT; 31)及为0x1;  / * X *的绝对值/
  诠释一个等号(=)? 〜X + 1:X;  / *计算指数* /
  INT E = 158;
  诠释T = A;
  而((T&GT;!&GT; 31)及为0x1){
    T&LT;&LT; = 1;
    E--;
  };  / *口罩,四舍五入时检查哪些位都移出* /
  静态无符号口罩[24] = {
    0,1,3,7,
    0xF的,0x1F的,
    0x3F的,0x7f的,
    0xFF的,到0x1FF,
    0x3FF处,0x7ff,
    0xFFF的,0x1FFF的,
    到0x3FFF,0x7FFF的,
    0xFFFF的,0x1ffff,
    0x3ffff,0x7ffff,
    0xfffff,0x1fffff,
    0x3fffff,0x7fffff
  };  / *口罩,请检查是否围捕,或向下* /
  静态无符号HOmasks [24] = {
    0,
    1,2,4,0x8中,为0x10,0x20的,0X40,0x80的,
    0x100的,为0x200,0x400的,为0x800,为0x1000,为0x2000,0x4000的,为0x8000,为0x10000,地址0x20000,0x40000,0x80000,0x100000处,0x200000,为0x400000
  };  INT S = A和掩模; [8]
  INT M =(T&GT;&GT; 8)及〜(((为0x1&所述;&下; 31)&GT;→8&所述;&所述; 1));
  M&安培; = 0x7fffff;  如果(S&GT; HOmasks [8]){
    /* 围捕 */
    M + = 1;
  }否则如果(S == HOmasks [8]){
    / *本轮下跌* /
    M = M +(M&安培; 1);
  }  / *特殊情况下指数的最后一位也是在尾数设置
   *和尾数本身是0 * /
  如果(M及(为0x1&所述;&下; 23)){
    E + = 1;
    M = 0;
  }  中期业绩等号(=)&LT;&LT; 31;
  RES | =(E&LT;&LT; 23);
  RES | =米;
  返回水库;
}

是否有人有任何想法,问题在哪里?


解决方案

C / C ++花车往往是与IEEE 754浮点标准兼容的(例如,在GCC)。零点来自href=\"http://en.wikipedia.org/wiki/IEEE_floating_point#Rounding_rules\" rel=\"nofollow\">四舍五入规则,从而

移一数到右边,使从右边一些比特消失。我们姑且称之为警戒位。现在,让我们把 HO 的最高位和 LO 我们数的最低位。现在假设警戒位仍然是我们数字的一部分。如果,例如,我们有3个警戒位这意味着价值我们的 LO 位为8(如果它设置)。现在,如果:


  1. 警戒位的值> 0.5 * LO值

    四舍五入到斯莫林可能更大的价值数,忽略符号


  2. 的'警戒位'==值0.5 * LO值


    • 使用当前的数值,如果 LO == 0

    • 数字+ = 1,否则


  3. 警戒位的值&LT; 0.5 * LO值


    • 使用当前的数值



  

为什么3警戒位意味着LO值为8?


假设我们有一个二进制的8位号:

 重量:128 64 32 16 8 4 2 1
二NUM:0 0 0 0 1 1 1 1

让我们正确的3位接班:

 权重:X X X 128 64 32 16 8 | 4 2 1
二NUM:0 0 0 0 0 0 0 1 | 1 1 1

正如你所见,有3位控卫在四号位的 LO 位结束,并且具有重量8.这是真的只为四舍五入的目的。权重必须是规范化之后,使 LO 位的权重再次变为1。


  

和我怎么能与位操作检查,如果警戒位> 0.5 *值??


最快的方法是使用查找表。假设我们正在制作一个8位号码:

 无符号数; //我们的数
无符号bitsToShift; //比特数移位断言(bitsToShift&LT; 8); // 8位无符号guardMasks [8] = {0,1,3,7,0xF的,为0x1F,0x3F的}
无符号LOvalues​​ [8] = {0,1,2,4,0x8中,为0x10,0x20的,0X40} // 2更快比较划分无符号guardBits =数字功放&; guardMasks [bitsToShift] //保护位的值
数=号&GT;&GT; bitsToShift;如果(guardBits&GT; LOvalues​​ [bitsToShift]){
...
}否则如果(guardBits == LOvalues​​ [bitsToShift]){
...
}其他{// guardBits&LT; LOvalues​​ [bitsToShift]
...
}

参考:由兰德尔海德写大code,第1卷

this is a general question about what precisely happens when I cast a very big/small SIGNED integer to a floating point using gcc 4.4.

I see some weird behaviour when doing the casting. Here are some examples:

MUSTBE is obtained with this method:

float f = (float)x;
unsigned int r;
memcpy(&r, &f, sizeof(unsigned int));

./btest -f float_i2f -1 0x80800001
input:          10000000100000000000000000000001
absolute value: 01111111011111111111111111111111

exponent:       10011101
mantissa:       00000000011111101111111111111111  (right shifted absolute value)

EXPECT:         11001110111111101111111111111111  (sign|exponent|mantissa)
MUST BE:        11001110111111110000000000000000  (sign ok, exponent ok,
                                                     mantissa???)

./btest -f float_i2f -1 0x3f7fffe0

EXPECT:    01001110011111011111111111111111
MUST BE:   01001110011111100000000000000000

./btest -f float_i2f -1 0x80004999                                                                  


EXPECT:    11001110111111111111111101101100
MUST BE:   11001110111111111111111101101101    (<- 1 added at the end)

So what bothers me that the mantissa is in both examples different then if I just shift my integer value to the right. The zeros at the end for instance. Where do they come from?

I only see this behaviour on big/small values. Values in the range -2^24, 2^24 work fine.

I wonder if someone can enlighten me what happens here. What are the steps too take on very big/small values.

This is an add on question to : function to convert float to int (huge integers) which is not as general as this one here.

EDIT Code:

unsigned float_i2f(int x) {
  if (x == 0) return 0;
  /* get sign of x */
  int sign = (x>>31) & 0x1;

  /* absolute value of x */
  int a = sign ? ~x + 1 : x;

  /* calculate exponent */
  int e = 158;
  int t = a;
  while (!(t >> 31) & 0x1) {
    t <<= 1;
    e--;
  };

  /* calculate mantissa */
  int m = (t >> 8) & ~(((0x1 << 31) >> 8 << 1));
  m &= 0x7fffff;

  int res = sign << 31;
  res |= (e << 23);
  res |= m;

  return res;
}

EDIT 2:

After Adams remarks and the reference to the book Write Great Code, I updated my routine with rounding. Still I get some rounding errors (now fortunately only 1 bit off).

Now if I do a test run, I get mostly good results but a couple of rounding errors like this:

input:  0xfefffff5
result: 11001011100000000000000000000101
GOAL:   11001011100000000000000000000110  (1 too low)

input:  0x7fffff
result: 01001010111111111111111111111111
GOAL:   01001010111111111111111111111110  (1 too high)

unsigned float_i2f(int x) {
  if (x == 0) return 0;
  /* get sign of x */
  int sign = (x>>31) & 0x1;

  /* absolute value of x */
  int a = sign ? ~x + 1 : x;

  /* calculate exponent */
  int e = 158;
  int t = a;
  while (!(t >> 31) & 0x1) {
    t <<= 1;
    e--;
  };

  /* mask to check which bits get shifted out when rounding */
  static unsigned masks[24] = {
    0, 1, 3, 7, 
    0xf, 0x1f, 
    0x3f, 0x7f, 
    0xff, 0x1ff, 
    0x3ff, 0x7ff, 
    0xfff, 0x1fff, 
    0x3fff, 0x7fff, 
    0xffff, 0x1ffff, 
    0x3ffff, 0x7ffff, 
    0xfffff, 0x1fffff, 
    0x3fffff, 0x7fffff
  };

  /* mask to check wether round up, or down */
  static unsigned HOmasks[24] = {
    0,
    1, 2, 4, 0x8, 0x10, 0x20, 0x40, 0x80,
    0x100, 0x200, 0x400, 0x800, 0x1000, 0x2000, 0x4000, 0x8000, 0x10000, 0x20000, 0x40000, 0x80000, 0x100000, 0x200000, 0x400000
  };

  int S = a & masks[8];
  int m = (t >> 8) & ~(((0x1 << 31) >> 8 << 1));
  m &= 0x7fffff;

  if (S > HOmasks[8]) {
    /* round up */
    m += 1;
  } else if (S == HOmasks[8]) {
    /* round down */
    m = m + (m & 1);
  }

  /* special case where last bit of exponent is also set in mantissa
   * and mantissa itself is 0 */
  if (m & (0x1 << 23)) {
    e += 1;
    m = 0;
  }

  int res = sign << 31;
  res |= (e << 23);
  res |= m;
  return res;
}

Does someone have any idea where the problem lies?

解决方案

C/C++ floats tend to be compatible with the IEEE 754 floating point standard (e.g. in gcc). The zeros come from the rounding rules.

Shifting a number to the right makes some bits from the right-hand side go away. Let's call them guard bits. Now let's call HO the highest bit and LO the lowest bit of our number. Now suppose that the guard bits are still a part of our number. If, for example, we have 3 guard bits it means that the value of our LO bit is 8 (if it is set). Now if:

  1. value of guard bits > 0.5 * value of LO

    rounds the number to the smalling possible greater value, ignoring the sign

  2. value of 'guard bits' == 0.5 * value of LO

    • use current number value if LO == 0
    • number += 1 otherwise
  3. value of guard bits < 0.5 * value of LO

    • use current number value

why do 3 guard bits mean the LO value is 8 ?

Suppose we have a binary 8 bit number:

weights:    128 64 32 16 8 4 2 1
binary num:   0  0  0  0 1 1 1 1

Let's shift it right by 3 bits:

weights:      x x x 128 64 32 16 8 | 4 2 1
binary num:   0 0 0   0  0  0  0 1 | 1 1 1

As you see, with 3 guard bits the LO bit ends up at the 4th position and has a weight of 8. It is true only for the purpose of rounding. The weights have to be 'normalized' afterwards, so that the weight of LO bit becomes 1 again.

And how can I check with bit operations if guard bits > 0.5 * value ??

The fastest way is to employ lookup tables. Suppose we're working on an 8 bit number:

unsigned number;          //our number
unsigned bitsToShift;     //number of bits to shift

assert(bitsToShift < 8);  //8 bits

unsigned guardMasks[8] = {0, 1, 3, 7, 0xf, 0x1f, 0x3f}
unsigned LOvalues[8] = {0, 1, 2, 4, 0x8, 0x10, 0x20, 0x40} //divided by 2 for faster comparison

unsigned guardBits = number & guardMasks[bitsToShift]; //value of the guard bits
number = number >> bitsToShift;

if(guardBits > LOvalues[bitsToShift]) {
...
} else if (guardBits == LOvalues[bitsToShift]) {
...
} else { //guardBits < LOvalues[bitsToShift]
...
}

Reference: Write Great Code, Volume 1 by Randall Hyde

这篇关于如果你投一个大INT浮动会发生什么的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆