可以精确表示为浮点数/双精度的整数范围 [英] Range of integers that can be expressed precisely as floats / doubles

查看:312
本文介绍了可以精确表示为浮点数/双精度的整数范围的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

可以表示为double(相当于float?)的(连续)整数的确切范围是什么我所问的原因是因为我很好奇这样的问题,如果发生精确度的错误

What is the exact range of (contiguous) integers that can be expressed as a double (resp. float?) The reason I ask is because I am curious for questions such as this one when a loss of accuracy will occur.

那是


  1. 什么是最小正整数 m ,以便 m + 1 不能精确地表示为double(分别为float)?

  2. 最大负整数 -n 是什么,使 -n-1 不能精确地表示为double(分别为float)? (可能与上述相同)。

  1. What is the least positive integer m such that m+1 cannot be precisely expressed as a double (resp. float)?
  2. What is the greatest negative integer -n such that -n-1 cannot be precisely expressed as a double (resp. float)? (May be the same as the above).

这意味着 -n m 具有精确的浮点表示。我基本上正在寻找两个浮动和双打的范围 [ - n,m]

This means that every integer between -n and m has an exact floating-point representation. I'm basically looking for the range [-n, m] for both floats and doubles.

让我们限制标准IEEE 754 32位和64位浮点表示的范围。我知道float有24位精度,double有53位(两者都有一个隐藏的引导位),但是由于浮点表示的复杂性,我正在寻找一个权威的答案。请不要挥手!

Let's limit the scope to the standard IEEE 754 32-bit and 64-bit floating point representations. I know that the float has 24 bits of precision and the double has 53 bits (both with a hidden leading bit), but due to the intricacies of the floating point representation I'm looking for an authoritative answer for this. Please don't wave your hands!

(理想的答案将证明从 0 m 是可表达的,而 m + 1 不是。)

(Ideal answer would prove that all the integers from 0 to m are expressible, and that m+1 is not.)

推荐答案

由于您在询问IEEE浮点类型,因此语言无关紧要。

Since you're asking about IEEE floating-point types, the language does not matter.

#include <iostream>
using namespace std;

int main(){

    float f0 = 16777215.; // 2^24 - 1
    float f1 = 16777216.; // 2^24
    float f2 = 16777217.; // 2^24 + 1

    cout << (f0 == f1) << endl;
    cout << (f1 == f2) << endl;

    double d0 = 9007199254740991.; // 2^53 - 1
    double d1 = 9007199254740992.; // 2^53
    double d2 = 9007199254740993.; // 2^53 + 1

    cout << (d0 == d1) << endl;
    cout << (d1 == d2) << endl;
}

输出:

0
1
0
1

所以float的限制是2 ^ 24。而双倍的限制是2 ^ 53。否定的是相同的,因为唯一的区别是符号位。

So the limit for float is 2^24. And the limit for double is 2^53. Negatives are the same since the only difference is the sign bit.

这篇关于可以精确表示为浮点数/双精度的整数范围的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆