如何在C双无限表示? [英] How is infinity represented in a C double?
问题描述
- s:1位为符号
- exp:11位为指数
+ infinity表示为一个特殊值,具有以下模式:
- s = 0
- 所有exp位均为1 b $ b $ all所有小数位为0 / li>
我认为完整的64位for double应该按以下顺序:
$ b $所以我写下面的C代码来验证它:
$ b $($)
//检查无限
double x1 =(double)0x7ff0000000000000; //这应该是+无穷
double x2 =(double)0x7ff0000000000001; //注意额外的结尾1,x2应该是NaN
printf(\\\
x1 =%f,x2 =%f sizeof(double)=%d,x1,x2,sizeof(x2));
if(x1 == x2)
printf(\\\
x1 == x2);
else
printf(\\\
x1!= x2);
但结果是:
<$ p $ (double)= 8
x1 == x2
$ b $ 为什么数字是一个有效的数字,而不是一些无穷大的错误? (我正在使用MinGW GCC编译器。) I 结果是: 详细的输出看起来有点奇怪,但我认为这一点很明确。 PS:看起来指针转换是没有必要的。只要使用 出于好奇,我用下面的代码检查了变量的位置。 我尝试了下面的代码: 输出结果为: 奇怪的部分是,尽管x1和x2与y1和y2具有相同的位模式,但和x4不同y4。 和 给出了这个: 他们为什么不同?如何获得y4? 首先, 设置位模式的直接方法是 然而,这是未定义的行为。 C标准禁止读取值( 即使这不是未定义的行为,它仍然是实现定义的行为: 这个实现定义的行为真的没有办法,因为不能保证机器会存储浮动点值与整数值的顺序相同。甚至还有一些机器使用像这样的字节顺序:< 1,0,3,2>我甚至不想知道是谁提出了这个好主意,但它是存在的,我们必须忍受它。 p> 回到最后一个问题:浮点运算本质上与整数运算不同。这些位具有特殊的含义,浮点单元考虑到这一点。特别是像infinities,NANs和非规范化数字这样的特殊值被以特殊的方式处理。由于 这使用浮点单位本身将位移动到正确的位置。由于无法使用浮点运算与NAN的尾数位进行交互,所以在该代码中不可能包括NAN的产生。那么,你可以生成一个NAN,但你不能控制其尾数位模式。 I learned from the book Computer Systems: A Programmer's Perspective that the IEEE standard requires the double precision floating number to be represented using the following 64-bit binary format: The +infinity is represented as a special value with the following pattern: And I think the full 64-bit for double should be in the following order: (s)(exp)(frac) So I write the following C code to verify it: But the result is: Why is the number a valid number rather than some infinity error? Why x1==x2? (I am using the MinGW GCC compiler.) I modified the code as below and the validated the Infinity and NaN successfully. The result is: The detailed output looks a bit strange, but I think the point is clear. PS.: It seems the pointer conversion is not necessary. Just use Out of curiosity, I checked the bit represetation of the variables with the following code. And I tried the code below: The output is: The strange part is, though x1 and x2 have the identical bit pattern as y1 and y2, the sum x4 is different from y4. And gives this: Why are they different? And how is y4 obtained? First, The straightforward way to set the bit pattern would be However, this is undefined behavior. The C standard forbids reading a value that has been stored as one fundamental type ( The only exception to this rule is the Even though this is not undefined behavior anymore, it is still implementation defined behavior: The order of the bytes in a There is really no way around this implementation defined behavior since there is no guarantee that a machine will store floating point values in the same order as integer values. There are even machines that use byte orders like this: <1, 0, 3, 2> I don't even want to know who came up with this brilliant idea, but it exists and we have to live with it. To your last question: floating point arithmetic is inherently different from integer arithmetic. The bits have special meanings, and the floating point unit takes that into account. Especially the special values like infinities, NANs, and denormalized numbers are treated in a special way. And since Actually, there is a way to set the bit patterns of all values except NANs in a portable way: Given three variables containing the bits of the sign, exponent, and mantissa, you can do this: This uses the floating point unit itself to move the bits into the right place. Since there is no way to interact with the mantissa bits of a NAN using floating point arithmetic, it is not possible to include the generation of NANs in this code. Well, you could generate a NAN, but you'd have no control on its mantissa bit pattern. 这篇关于如何在C双无限表示?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋! x1 = 9218868437227405300.000000,x2 = 9218868437227405300.000000 sizeof b
为什么x1 == x2?$ / b
$ b
ADD 1
//检查无穷大和NaN
unsigned长长的x1 = 0x7ff0000000000000ULL; // + infinity as double
unsigned long long x2 = 0xfff0000000000000ULL; // -infinity as double
unsigned long long x3 = 0x7ff0000000000001ULL; // NaN double
double y1 = *((double *)(& x1));
double y2 = *((double *)(& x2));
double y3 = *((double *)(& x3));
printf(\\\
sizeof(long long)=%d,sizeof(x1));
printf(\ nx1 =%f,x2 =%f,x3 =%f,x1,x2,x3); //%f足够输出
printf(\\\
y1 =%f,y2 =%f,y3 =%f,y1,y2,y3);
pre > sizeof(long long)= 8
x1 = 1.#INF00,x2 = -1。#INF00,x3 = 1.#SNAN0
y1 = 1.#INF00, y2 = -1。#INF00,y3 = 1.#QNAN0
%f
来告诉 printf
函数来解释 unsigned long long $
double
格式中的c $ c>变量。
$ b ADD 2
typedef unsigned char * byte_pointer ;
void show_bytes(byte_pointer start,int len)
{
int i;
for(i = len-1; i> = 0; i--)
{
printf(%。2x,start [i]);
}
printf(\\\
);
}
//检查无穷大和NaN
unsigned long long x1 = 0x7ff0000000000000ULL; // + infinity as double
unsigned long long x2 = 0xfff0000000000000ULL; // -infinity as double
unsigned long long x3 = 0x7ff0000000000001ULL; // NaN double
double y1 = *((double *)(& x1));
double y2 = *((double *)(& x2));
double y3 = *((double *)(& x3));
unsigned long long x4 = x1 + x2; //我想检查(+ infinity)+( - 无穷大)
double y4 = y1 + y2; //我想检查(+ infinity)+( - 无穷大)
printf(\\\\\\\\\\\\':);
show_bytes((byte_pointer)& x1,sizeof(x1));
printf(\\\
x2:);
show_bytes((byte_pointer)& x2,sizeof(x2));
printf(\\\
x3:);
show_bytes((byte_pointer)& x3,sizeof(x3));
printf(\\\
x4:);
show_bytes((byte_pointer)& x4,sizeof(x4));
printf(\\\
y1:);
show_bytes((byte_pointer)& y1,sizeof(y1));
printf(\\\
y2:);
show_bytes((byte_pointer)& y2,sizeof(y2));
printf(\\\
y3:);
show_bytes((byte_pointer)& y3,sizeof(y3));
printf(\\\
y4:);
show_bytes((byte_pointer)& y4,sizeof(y4));
x1:7ff0000000000000
x2:fff0000000000000
x3:7ff0000000000001
x4:7fe0000000000000
y1:7ff0000000000000
y2:fff0000000000000
y3:7ff8000000000001
$ b $ y4:fff8000000000000 // <==与x4不同
printf(\\\
y4 =% f,y4);
y4 = -1。#IND00 //这是什么意思?
0x7ff0000000000000
确实是双无限的位表示。但是转换不会设置位表示,它会将 0x7ff0000000000000
的逻辑值转换为64位整数。所以,您需要使用其他方式来设置位模式。
uint64_t位= 0x7ff0000000000000;
double infinity = *(double *)& bits;
uint64_t
)作为另一个基本类型( double
)存储。这被称为严格别名规则,并且允许编译器生成更好的代码,因为它可以假定一种类型的读取顺序和另一种类型的写入顺序是不相关的。
这个规则的唯一例外是 char
类型:您明确允许将任何指针转换为 char *
然后回来。所以你可以尝试使用这个代码:
$ p code char bits [] = {0x7f,0xf0,0,0,0,0 ,0,0};
double infinity = *(double *)bits;
double
中的字节顺序取决于您的机器。给定的代码在一个像ARM和Power家族这样的大型机器上工作,而不是在X86上。对于X86,你需要这个版本:
$ p code> char bits [] = {0,0,0,0,0,0, 0xf0,0x7f};
double infinity = *(double *)bits;
+ inf + -inf
被定义为产生一个NAN,所以你的浮点单元发出一个NAN的位模式。整数单元不知道无穷或NAN,所以它只是将位模式解释为一个巨大的整数,并愉快地执行整数加法(在这种情况下发生溢出)。由此产生的位模式不是NAN的。它正好是一个真正巨大的正浮点数的位模式(准确地说),但没有任何意义。
uint64_t sign = ...,exponent = ...,尾数= ...;
双重结果;
assert(!(exponent == 0x7ff&& mantissa)); //不能以这种方式设置NAN的位。
if(exponent){
//此代码不适用于非规格化数字。当指数信号NAN或无穷大时,它不会兑现尾数的值。
result = mantissa +(1ull <= 52); //添加隐含位。
result / =(1ull <= 52); //这确保指数在逻辑上为零(等于偏差),以便下一个操作按预期工作。
结果* = pow(2,(double)((signed)exponent - 0x3ff)); //这个设置指数。
} else {
//此代码适用于非规格化数字。
结果=尾数; //没有隐含的位
result / =(1ull <= 51); / /这确保下一个操作按预期工作。
结果* = pow(2,-0x3ff); //缩小到非规范化的范围。
}
结果* =(sign?-1.0:1.0); //这设置标志。
//Check the infinity
double x1 = (double)0x7ff0000000000000; // This should be the +infinity
double x2 = (double)0x7ff0000000000001; // Note the extra ending 1, x2 should be NaN
printf("\nx1 = %f, x2 = %f sizeof(double) = %d", x1,x2, sizeof(x2));
if (x1 == x2)
printf("\nx1 == x2");
else
printf("\nx1 != x2");
x1 = 9218868437227405300.000000, x2 = 9218868437227405300.000000 sizeof(double) = 8
x1 == x2
ADD 1
//Check the infinity and NaN
unsigned long long x1 = 0x7ff0000000000000ULL; // +infinity as double
unsigned long long x2 = 0xfff0000000000000ULL; // -infinity as double
unsigned long long x3 = 0x7ff0000000000001ULL; // NaN as double
double y1 =* ((double *)(&x1));
double y2 =* ((double *)(&x2));
double y3 =* ((double *)(&x3));
printf("\nsizeof(long long) = %d", sizeof(x1));
printf("\nx1 = %f, x2 = %f, x3 = %f", x1, x2, x3); // %f is good enough for output
printf("\ny1 = %f, y2 = %f, y3 = %f", y1, y2, y3);
sizeof(long long) = 8
x1 = 1.#INF00, x2 = -1.#INF00, x3 = 1.#SNAN0
y1 = 1.#INF00, y2 = -1.#INF00, y3 = 1.#QNAN0
%f
to tell the printf
function to interpret the unsigned long long
variable in double
format.ADD 2
typedef unsigned char *byte_pointer;
void show_bytes(byte_pointer start, int len)
{
int i;
for (i = len-1; i>=0; i--)
{
printf("%.2x", start[i]);
}
printf("\n");
}
//check the infinity and NaN
unsigned long long x1 = 0x7ff0000000000000ULL; // +infinity as double
unsigned long long x2 = 0xfff0000000000000ULL; // -infinity as double
unsigned long long x3 = 0x7ff0000000000001ULL; // NaN as double
double y1 =* ((double *)(&x1));
double y2 =* ((double *)(&x2));
double y3 = *((double *)(&x3));
unsigned long long x4 = x1 + x2; // I want to check (+infinity)+(-infinity)
double y4 = y1 + y2; // I want to check (+infinity)+(-infinity)
printf("\nx1: ");
show_bytes((byte_pointer)&x1, sizeof(x1));
printf("\nx2: ");
show_bytes((byte_pointer)&x2, sizeof(x2));
printf("\nx3: ");
show_bytes((byte_pointer)&x3, sizeof(x3));
printf("\nx4: ");
show_bytes((byte_pointer)&x4, sizeof(x4));
printf("\ny1: ");
show_bytes((byte_pointer)&y1, sizeof(y1));
printf("\ny2: ");
show_bytes((byte_pointer)&y2, sizeof(y2));
printf("\ny3: ");
show_bytes((byte_pointer)&y3, sizeof(y3));
printf("\ny4: ");
show_bytes((byte_pointer)&y4, sizeof(y4));
x1: 7ff0000000000000
x2: fff0000000000000
x3: 7ff0000000000001
x4: 7fe0000000000000
y1: 7ff0000000000000
y2: fff0000000000000
y3: 7ff8000000000001
y4: fff8000000000000 // <== Different with x4
printf("\ny4=%f", y4);
y4=-1.#IND00 // What does it mean???
0x7ff0000000000000
is indeed the bit representation of a double infinity. But the cast does not set the bit representation, it converts the logical value of 0x7ff0000000000000
interpreted as a 64 bit integer. So, you need to use some other way to set the bit pattern.uint64_t bits = 0x7ff0000000000000;
double infinity = *(double*)&bits;
uint64_t
) as another fundamental type (double
). This is known as strict aliasing rules, and allows the compiler to generate better code because it can assume that the order of the read of one type and a write of another type is irrelevant.char
types: You are explicitly allowed to cast any pointer to a char*
and back. So you could try to use this code:char bits[] = {0x7f, 0xf0, 0, 0, 0, 0, 0, 0};
double infinity = *(double*)bits;
double
depends on your machine. The given code works on a big endian machine like ARM and the Power family, but not on X86. For the X86 you need this version:char bits[] = {0, 0, 0, 0, 0, 0, 0xf0, 0x7f};
double infinity = *(double*)bits;
+inf + -inf
is defined to yield a NAN, your floating point unit emits the bit pattern of a NAN. The integer unit does not know about infinities or NAN, so it just interpretes the bit pattern as a huge integer and happily performs an integer addition (which happens to overflow in this case). The resulting bit pattern is not that of a NAN. It happens to be the bit pattern of a really huge, positive floating point number (2^1023
, to be precise), but that bears no meaning.
uint64_t sign = ..., exponent = ..., mantissa = ...;
double result;
assert(!(exponent == 0x7ff && mantissa)); //Can't set the bits of a NAN in this way.
if(exponent) {
//This code does not work for denormalized numbers. And it won't honor the value of mantissa when the exponent signals NAN or infinity.
result = mantissa + (1ull << 52); //Add the implicit bit.
result /= (1ull << 52); //This makes sure that the exponent is logically zero (equals the bias), so that the next operation will work as expected.
result *= pow(2, (double)((signed)exponent - 0x3ff)); //This sets the exponent.
} else {
//This code works for denormalized numbers.
result = mantissa; //No implicit bit.
result /= (1ull << 51); //This ensures that the next operation works as expected.
result *= pow(2, -0x3ff); //Scale down to the denormalized range.
}
result *= (sign ? -1.0 : 1.0); //This sets the sign.