如果你投一个大INT浮动会发生什么 [英] what happens if you cast a big int to float
问题描述
这是什么时候我用gcc 4.4转换为浮点非常大/小符号整数什么precisely发生一个一般性的问题。
我看到一些奇怪的行为做铸造时。下面是一些例子:
MUSTBE用该方法获得:
浮动F =(浮点)X;
无符号整数读;
的memcpy(安培; R&安培; F,的sizeof(无符号整数));./btest -f float_i2f -1 0x80800001
输入:10000000100000000000000000000001
绝对值:01111111011111111111111111111111指数:10011101
尾数:00000000011111101111111111111111(右移绝对值)预计有:11001110111111101111111111111111(符号|指数|尾数)
必须是:11001110111111110000000000000000(签名OK,OK指数,
尾数???)./btest -f float_i2f -1 0x3f7fffe0预计有:01001110011111011111111111111111
必须是:01001110011111100000000000000000./btest -f float_i2f -1 0x80004999
预计有:11001110111111111111111101101100
必须是:11001110111111111111111101101101(小于 - 1加末)
,使尾数是在不同的话,如果我只是我的整数值向右移两个例子让我困扰。在结束比如零。哪里来的?
我只看到大/小值此行为。在范围值-2 ^ 24,2 ^ 24做工精细。
我不知道是否有人能启发我在这里会发生什么。什么是太承担非常大/小值的步骤。
这是对问题的加载:<一href=\"http://stackoverflow.com/questions/25699585/function-to-convert-float-to-int-huge-integers\">function转换为浮动INT(整数巨大)这是不一般,因为这一个在这里。
编辑
code:
无符号float_i2f(INT X){
如果(X == 0)返回0;
/ *获取X *的标志/
INT符号=(X GT;&GT; 31)及为0x1; / * X *的绝对值/
诠释一个等号(=)? 〜X + 1:X; / *计算指数* /
INT E = 158;
诠释T = A;
而((T&GT;!&GT; 31)及为0x1){
T&LT;&LT; = 1;
E--;
}; / *计算尾数* /
INT M =(T&GT;&GT; 8)及〜(((为0x1&所述;&下; 31)&GT;→8&所述;&所述; 1));
M&安培; = 0x7fffff; 中期业绩等号(=)&LT;&LT; 31;
RES | =(E&LT;&LT; 23);
RES | =米; 返回水库;
}
编辑2:
亚当斯的言论和参考书籍写大code后,我更新了我的舍入程序。我仍然得到一些舍入误差(现在幸运的是只有1位关闭)。
现在,如果我做了试运行,我得到的大多是不错的结果,但一对夫妇舍入误差是这样的:
输入:0xfefffff5
结果:11001011100000000000000000000101
目标:11001011100000000000000000000110(1太低)输入:0x7fffff
结果:01001010111111111111111111111111
目标:01001010111111111111111111111110(1太高)无符号float_i2f(INT X){
如果(X == 0)返回0;
/ *获取X *的标志/
INT符号=(X GT;&GT; 31)及为0x1; / * X *的绝对值/
诠释一个等号(=)? 〜X + 1:X; / *计算指数* /
INT E = 158;
诠释T = A;
而((T&GT;!&GT; 31)及为0x1){
T&LT;&LT; = 1;
E--;
}; / *口罩,四舍五入时检查哪些位都移出* /
静态无符号口罩[24] = {
0,1,3,7,
0xF的,0x1F的,
0x3F的,0x7f的,
0xFF的,到0x1FF,
0x3FF处,0x7ff,
0xFFF的,0x1FFF的,
到0x3FFF,0x7FFF的,
0xFFFF的,0x1ffff,
0x3ffff,0x7ffff,
0xfffff,0x1fffff,
0x3fffff,0x7fffff
}; / *口罩,请检查是否围捕,或向下* /
静态无符号HOmasks [24] = {
0,
1,2,4,0x8中,为0x10,0x20的,0X40,0x80的,
0x100的,为0x200,0x400的,为0x800,为0x1000,为0x2000,0x4000的,为0x8000,为0x10000,地址0x20000,0x40000,0x80000,0x100000处,0x200000,为0x400000
}; INT S = A和掩模; [8]
INT M =(T&GT;&GT; 8)及〜(((为0x1&所述;&下; 31)&GT;→8&所述;&所述; 1));
M&安培; = 0x7fffff; 如果(S&GT; HOmasks [8]){
/* 围捕 */
M + = 1;
}否则如果(S == HOmasks [8]){
/ *本轮下跌* /
M = M +(M&安培; 1);
} / *特殊情况下指数的最后一位也是在尾数设置
*和尾数本身是0 * /
如果(M及(为0x1&所述;&下; 23)){
E + = 1;
M = 0;
} 中期业绩等号(=)&LT;&LT; 31;
RES | =(E&LT;&LT; 23);
RES | =米;
返回水库;
}
是否有人有任何想法,问题在哪里?
C / C ++花车往往是与IEEE 754浮点标准兼容的(例如,在GCC)。零点来自href=\"http://en.wikipedia.org/wiki/IEEE_floating_point#Rounding_rules\" rel=\"nofollow\">四舍五入规则,从而 移一数到右边,使从右边一些比特消失。我们姑且称之为 四舍五入到斯莫林可能更大的价值数,忽略符号 的'警戒位'==值0.5 * 为什么3警戒位意味着LO值为8? 假设我们有一个二进制的8位号:警戒位
。现在,让我们把 HO
的最高位和 LO
我们数的最低位。现在假设警戒位
仍然是我们数字的一部分。如果,例如,我们有3个警戒位
这意味着价值我们的 LO
位为8(如果它设置)。现在,如果:警戒位的值
> 0.5 * LO值
LO值
LO
== 0 警戒位的值
&LT; 0.5 * LO值
块引用>
重量:128 64 32 16 8 4 2 1
二NUM:0 0 0 0 1 1 1 1
让我们正确的3位接班:
权重:X X X 128 64 32 16 8 | 4 2 1
二NUM:0 0 0 0 0 0 0 1 | 1 1 1
正如你所见,有3位控卫在四号位的 LO
位结束,并且具有重量8.这是真的只为四舍五入的目的。权重必须是规范化之后,使 LO
位的权重再次变为1。
和我怎么能与位操作检查,如果警戒位> 0.5 *值??
块引用>最快的方法是使用查找表。假设我们正在制作一个8位号码:
无符号数; //我们的数
无符号bitsToShift; //比特数移位断言(bitsToShift&LT; 8); // 8位无符号guardMasks [8] = {0,1,3,7,0xF的,为0x1F,0x3F的}
无符号LOvalues [8] = {0,1,2,4,0x8中,为0x10,0x20的,0X40} // 2更快比较划分无符号guardBits =数字功放&; guardMasks [bitsToShift] //保护位的值
数=号&GT;&GT; bitsToShift;如果(guardBits&GT; LOvalues [bitsToShift]){
...
}否则如果(guardBits == LOvalues [bitsToShift]){
...
}其他{// guardBits&LT; LOvalues [bitsToShift]
...
}参考:由兰德尔海德写大code,第1卷
this is a general question about what precisely happens when I cast a very big/small SIGNED integer to a floating point using gcc 4.4.
I see some weird behaviour when doing the casting. Here are some examples:
MUSTBE is obtained with this method:
float f = (float)x; unsigned int r; memcpy(&r, &f, sizeof(unsigned int)); ./btest -f float_i2f -1 0x80800001 input: 10000000100000000000000000000001 absolute value: 01111111011111111111111111111111 exponent: 10011101 mantissa: 00000000011111101111111111111111 (right shifted absolute value) EXPECT: 11001110111111101111111111111111 (sign|exponent|mantissa) MUST BE: 11001110111111110000000000000000 (sign ok, exponent ok, mantissa???) ./btest -f float_i2f -1 0x3f7fffe0 EXPECT: 01001110011111011111111111111111 MUST BE: 01001110011111100000000000000000 ./btest -f float_i2f -1 0x80004999 EXPECT: 11001110111111111111111101101100 MUST BE: 11001110111111111111111101101101 (<- 1 added at the end)
So what bothers me that the mantissa is in both examples different then if I just shift my integer value to the right. The zeros at the end for instance. Where do they come from?
I only see this behaviour on big/small values. Values in the range -2^24, 2^24 work fine.
I wonder if someone can enlighten me what happens here. What are the steps too take on very big/small values.
This is an add on question to : function to convert float to int (huge integers) which is not as general as this one here.
EDIT Code:
unsigned float_i2f(int x) { if (x == 0) return 0; /* get sign of x */ int sign = (x>>31) & 0x1; /* absolute value of x */ int a = sign ? ~x + 1 : x; /* calculate exponent */ int e = 158; int t = a; while (!(t >> 31) & 0x1) { t <<= 1; e--; }; /* calculate mantissa */ int m = (t >> 8) & ~(((0x1 << 31) >> 8 << 1)); m &= 0x7fffff; int res = sign << 31; res |= (e << 23); res |= m; return res; }
EDIT 2:
After Adams remarks and the reference to the book Write Great Code, I updated my routine with rounding. Still I get some rounding errors (now fortunately only 1 bit off).
Now if I do a test run, I get mostly good results but a couple of rounding errors like this:
input: 0xfefffff5 result: 11001011100000000000000000000101 GOAL: 11001011100000000000000000000110 (1 too low) input: 0x7fffff result: 01001010111111111111111111111111 GOAL: 01001010111111111111111111111110 (1 too high) unsigned float_i2f(int x) { if (x == 0) return 0; /* get sign of x */ int sign = (x>>31) & 0x1; /* absolute value of x */ int a = sign ? ~x + 1 : x; /* calculate exponent */ int e = 158; int t = a; while (!(t >> 31) & 0x1) { t <<= 1; e--; }; /* mask to check which bits get shifted out when rounding */ static unsigned masks[24] = { 0, 1, 3, 7, 0xf, 0x1f, 0x3f, 0x7f, 0xff, 0x1ff, 0x3ff, 0x7ff, 0xfff, 0x1fff, 0x3fff, 0x7fff, 0xffff, 0x1ffff, 0x3ffff, 0x7ffff, 0xfffff, 0x1fffff, 0x3fffff, 0x7fffff }; /* mask to check wether round up, or down */ static unsigned HOmasks[24] = { 0, 1, 2, 4, 0x8, 0x10, 0x20, 0x40, 0x80, 0x100, 0x200, 0x400, 0x800, 0x1000, 0x2000, 0x4000, 0x8000, 0x10000, 0x20000, 0x40000, 0x80000, 0x100000, 0x200000, 0x400000 }; int S = a & masks[8]; int m = (t >> 8) & ~(((0x1 << 31) >> 8 << 1)); m &= 0x7fffff; if (S > HOmasks[8]) { /* round up */ m += 1; } else if (S == HOmasks[8]) { /* round down */ m = m + (m & 1); } /* special case where last bit of exponent is also set in mantissa * and mantissa itself is 0 */ if (m & (0x1 << 23)) { e += 1; m = 0; } int res = sign << 31; res |= (e << 23); res |= m; return res; }
Does someone have any idea where the problem lies?
解决方案C/C++ floats tend to be compatible with the IEEE 754 floating point standard (e.g. in gcc). The zeros come from the rounding rules.
Shifting a number to the right makes some bits from the right-hand side go away. Let's call them
guard bits
. Now let's callHO
the highest bit andLO
the lowest bit of our number. Now suppose that theguard bits
are still a part of our number. If, for example, we have 3guard bits
it means that the value of ourLO
bit is 8 (if it is set). Now if:
value of
guard bits
> 0.5 * value ofLO
rounds the number to the smalling possible greater value, ignoring the sign
value of 'guard bits' == 0.5 * value of
LO
- use current number value if
LO
== 0- number += 1 otherwise
value of
guard bits
< 0.5 * value ofLO
- use current number value
why do 3 guard bits mean the LO value is 8 ?
Suppose we have a binary 8 bit number:
weights: 128 64 32 16 8 4 2 1 binary num: 0 0 0 0 1 1 1 1
Let's shift it right by 3 bits:
weights: x x x 128 64 32 16 8 | 4 2 1 binary num: 0 0 0 0 0 0 0 1 | 1 1 1
As you see, with 3 guard bits the
LO
bit ends up at the 4th position and has a weight of 8. It is true only for the purpose of rounding. The weights have to be 'normalized' afterwards, so that the weight ofLO
bit becomes 1 again.And how can I check with bit operations if guard bits > 0.5 * value ??
The fastest way is to employ lookup tables. Suppose we're working on an 8 bit number:
unsigned number; //our number unsigned bitsToShift; //number of bits to shift assert(bitsToShift < 8); //8 bits unsigned guardMasks[8] = {0, 1, 3, 7, 0xf, 0x1f, 0x3f} unsigned LOvalues[8] = {0, 1, 2, 4, 0x8, 0x10, 0x20, 0x40} //divided by 2 for faster comparison unsigned guardBits = number & guardMasks[bitsToShift]; //value of the guard bits number = number >> bitsToShift; if(guardBits > LOvalues[bitsToShift]) { ... } else if (guardBits == LOvalues[bitsToShift]) { ... } else { //guardBits < LOvalues[bitsToShift] ... }
Reference: Write Great Code, Volume 1 by Randall Hyde
这篇关于如果你投一个大INT浮动会发生什么的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!