浮点数中整数的精确表示 [英] Exact representation of integers in floating points

查看：509 发布时间：2020/6/12 19:24:46 c++ floating-point precision

本文介绍了浮点数中整数的精确表示的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我试图理解浮点数格式的整数表示形式.

< p>由于IEEE浮点格式的尾数只有23位，我希望任何大于1≤22的整数都只是一个近似表示.这不是我在g ++中观察到的情况

下面的两个cout都打印相同的值33554432.

由于尾数部分是负责精度的部分，因此我们如何能够表示(存储)需要超过23位才能准确存储的精确数字.

void floating_point_precision(){
  cout<< setprecision(10);
  float fp = (1<<25);
  cout<< fp <<endl;
  cout<< (1<<25) <<endl;
}

作为基于以下答案的跟进，即使两个fp，i的打印方式不同，以下代码也为什么不执行不等于"操作?

void floating_point_precision(){
  cout<< setprecision(10);
  float fp = ((1<<25)+1);
  cout<< fp <<endl;
  int i = ((1<<25)+1)  ;
  cout<< i <<endl;
  if(i != fp)
    cout<< "Not equal" <<endl;
}

解决方案

确实，IEEE浮点数只有有限的尾数位.如果有23个尾数位，那么它可以准确地表示2 ²³个不同的整数值.

但是由于浮点分别存储2的幂次幂，所以它(受限制的指数范围)可以精确地表示2 ²³个值中的任意一个次幂两个.

33554432恰好是2 ²⁵，因此只需要一个尾数位即可精确表示(加上表示乘以2的幂的二进制指数).它的二进制表示形式是10000000000000000000000000，它具有26位，但是只有1个 significant 位. (嗯，实际上它们都很重要，但是您明白了.)

您会发现，它的相邻整数值33554431和33554433 不能精确地用32位float表示. (但是它们可以用64位double表示.)

更一般地，类型float的连续可表示值之间的差异随该值的大小而变化.在我的系统上(大多数系统使用IEEE格式，但是该标准对此没有要求)，请使用以下程序:

#include <iostream>
#include <iomanip>
#include <cmath>

void show(float f) {
    std::cout << std::nextafterf(f, 0.0) << "\n"
              << f << "\n"
              << std::nextafterf(f, f*2) << "\n";
    putchar('\n');
}

int main(void) {
    std::cout << std::setprecision(24);

    show(1);
    show(1<<23);
    show(1<<24);
    show(1<<30);
}

产生以下输出:

 0.999999940395355224609375
1
1.00000011920928955078125

8388607.5
8388608
8388609

16777215
16777216
16777218

1073741760
1073741824
1073741952

它显示类型为float的数字1、2 ²³，2 ²⁴和2 ^{30 的直接前任和后继}.如您所见，数字越大，差距越大，每乘以2的次方，差距的大小就会增加一倍.

使用double或long double类型，您将获得相似的结果，但差距更小.

I am trying to understand the representation of integers in floating point format.

Since the IEEE floating point format have only 23 bits for mantissa, i expect any integer which is greater than 1<<22 to be only a approx representation. This is not what i am observing in g++

both of the cout below prints the same value 33554432.

Since the mantissa part is the one which is responsible for the precision how can we be able to represent (store) exact number which need more than 23 bits to be stored accurately.

void floating_point_precision(){
  cout<< setprecision(10);
  float fp = (1<<25);
  cout<< fp <<endl;
  cout<< (1<<25) <<endl;
}

As a followup based on the answer below why is the following code not executing "Not Equal" even though the print of both the fp,i are different.

void floating_point_precision(){
  cout<< setprecision(10);
  float fp = ((1<<25)+1);
  cout<< fp <<endl;
  int i = ((1<<25)+1)  ;
  cout<< i <<endl;
  if(i != fp)
    cout<< "Not equal" <<endl;
}

解决方案

It's true that IEEE floating-point only has a limited number of mantissa bits. If there are 23 mantissa bits, then it can represent 2²³ distinct integer values exactly.

But since floating-point stores a power-of-two exponent separately, it can (subject to the limited exponent range) represent exactly any of those 2²³ values times a power of two.

33554432 is exactly 2²⁵, so it requires just one mantissa bit to represent it exactly (plus a binary exponent that denotes multiplication by a power of two). Its binary representation is 10000000000000000000000000, which has 26 bits but only 1 significant bit. (Well, actually they're all significant, but you get the idea.)

You'll find that its neighboring integer values 33554431 and 33554433 cannot be represented exactly in 32-bit float. (But they can be represented in 64-bit double.)

More generally, the difference between consecutive representable values of type float varies with the magnitude of the value. On my system (most systems use IEEE format, but the standard doesn't require that), this program:

#include <iostream>
#include <iomanip>
#include <cmath>

void show(float f) {
    std::cout << std::nextafterf(f, 0.0) << "\n"
              << f << "\n"
              << std::nextafterf(f, f*2) << "\n";
    putchar('\n');
}

int main(void) {
    std::cout << std::setprecision(24);

    show(1);
    show(1<<23);
    show(1<<24);
    show(1<<30);
}

produces this output:

0.999999940395355224609375
1
1.00000011920928955078125

8388607.5
8388608
8388609

16777215
16777216
16777218

1073741760
1073741824
1073741952

It shows the immediate predecessor and successor, in type float, of the numbers 1, 2²³, 2²⁴, and 2³⁰. As you can see, the gaps get bigger for larger numbers, with the gap doubling in size at each power of 2.

You'd get similar results, but with smaller gaps, with type double or long double.

这篇关于浮点数中整数的精确表示的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

浮点数中整数的精确表示 [英] Exact representation of integers in floating points

问题描述

相关文章

C/C++开发最新文章

热门教程

热门工具

登录关闭

浮点数中整数的精确表示 [英] Exact representation of integers in floating points

问题描述

相关文章

C/C++开发最新文章

热门教程

热门工具

登录 关闭

登录关闭