所有32位整数都可以用双精度表示 [英] Can all 32 bit ints be exactly represented as a double?

查看:320
本文介绍了所有32位整数都可以用双精度表示的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述


可能重复:

哪个是IEEE 754 float无法正确表示的第一个整数?

这是基本的问题,我的感觉是答案是肯定的(int = 32位,double = 53位mantisa +符号位)。

This is basic question, my feeling is that the answer is yes(int = 32 bits, double = 53 bit mantisa + sign bit).

基本上可以断言?

int x = get_random_int();
double dx = x;
int x1 = (int) dx;
assert(x1 ==x);
if  (INT_MAX-10>x)
 {
       dx+=10;
       int x2=(int) dx;
       assert(x+10 == x2);
 }

显然,涉及到具有分割和类似东西的复杂表达式的东西((int) / 3 * 3)不一样5/3 * 3)不工作,但我不知道做转换和adition / substraction(如果没有发生溢出)保留等价。

Obviously stuff involving complicated expressions with divisions and similar stuff ( (int)(5.0/3*3) is not the same as 5/3*3)wont work, but I wonder do conversions and adition/substraction(if no overflow occurs) preserve equivalence.

推荐答案

如果尾数中的位数是> =整数中的位数那么答案是肯定的。在您的问题中,您可以为 int 以及 double 的尾数提供特定的已知大小,但知道这一点是不是,由2003 C ++标准保证,对于 int double 的尾数。

If the number of bits in the mantissa is >= the number of bits in the integer, then the answer is yes. In your question you give specific, known sizes for int and the mantissa of double, but it's useful to know that this is not guaranteed by the 2003 C++ standard, which says nothing about the relative sizes of int and double's mantissa.

请注意,C和C ++不需要使用IEEE 754浮点算术。根据2003年C ++标准的3.8.1 / 8,

Note that C and C++ are not required to use IEEE 754 floating-point arithmetic. According to 3.8.1/8 of the 2003 C++ standard,


浮点类型的值表示是实现定义的。 p>

The value representation of floating-point types is implementation-defined.

实际上,C ++允许不使用二进制尾数的浮点表示。对于C,#包括< limits.h>可以用于推断关于基本类型的信息。特别是,如果 FLT_RADIX 升高到电源 DBL_MANT_DIG 大于或等于 INT_MAX ,则所有 int 值可以正确表示。在C ++中,相关数量命名为 numeric_limits< double> :: radix numeric_limits< double> :: digits numeric_limits< int> :: max()

In fact C++ allows floating point representations that don't even use binary mantissas. For C, #including <limits.h> can be used to infer information about fundamental types. In particular, if FLT_RADIX raised to the power DBL_MANT_DIG is greater than or equal to INT_MAX, then all int values can be represented exactly. In C++, the relevant quantities are named numeric_limits<double>::radix, numeric_limits<double>::digits and numeric_limits<int>::max().

给定两个整数操作数和一个总是产生一个整数的操作从整数操作数(例如 + * ,但不是 / ),所有IEEE 754舍入模式将完全产生一个整数。如果这个整数在 int 中可表示(因此在 double 中可以表示,假设我们假设其尾数是至少与 int 一样宽),那么它将是使用相应的整数运算得到的整数。任何明智的FP实现都将保留上述保证,即使不符合IEEE 754标准。

Given two integer operands and an operation that always produces an integer from integer operands (such as + or *, but not /), all IEEE 754 rounding modes will produce an integer exactly. If this integer is representable in an int (and therefore exactly representable in a double, given our assumption that its mantissa is at least as wide as an int), then it will be the same integer you would get by using the corresponding integer operation. Any sensible FP implementation will preserve the above guarantees, even if it is not IEEE 754 compliant.

这篇关于所有32位整数都可以用双精度表示的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆