C ++ 32位vs 64位浮动极限 [英] C++ 32bit vs 64bit floating limit

查看:197
本文介绍了C ++ 32位vs 64位浮动极限的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

给定代码段如下,我只是想知道

Given the code segment as follow, I just want to know


  • 为什么long double的最大值在64位比

  • 为什么64位版本无法扩展32位版本的数字以填充40精度输出?
  • 看起来LDBL_MIN和LDBL_MAX的值是否相等,是否有错误?
  • why the maximum value of long double is smaller in 64bit than that in 32bit?
  • why 64-bit version cannot expand as much digits as in 32-bit version to fill the "40" precision output?
  • it seems that the values of LDBL_MIN and LDBL_MAX are equal, is that a bug?

我已经研究了float.h文件

I have looked into the float.h files in my machine but cannot find the explicit definition of these macro constants.

测试代码(平台= Win7-64bit)

Testing Code (Platform = Win7-64bit)

#include <cfloat>
#include <iomanip>
cout<<"FLT_MAX   ="<< setprecision(40) << FLT_MAX  << endl;
cout<<"DBL_MAX   ="<< setprecision(40) << DBL_MAX  << endl;
cout<<"LDBL_MAX  ="<< setprecision(40) << LDBL_MAX << endl;
cout<<"FLT_MIN   ="<< setprecision(40) << FLT_MIN  << endl;
cout<<"DBL_MIN   ="<< setprecision(40) << DBL_MIN  << endl;
cout<<"LDBL_MIN  ="<< setprecision(40) << LDBL_MIN << endl;

32位结果(MinGW-20120426)

32-bit outcome (MinGW-20120426)

FLT_MAX  =340282346638528859811704183484516925440
DBL_MAX  =1.797693134862315708145274237317043567981e+308
LDBL_MAX =1.189731495357231765021263853030970205169e+4932
FLT_MIN  =1.175494350822287507968736537222245677819e-038
DBL_MIN  =2.225073858507201383090232717332404064219e-308
LDBL_MIN =3.362103143112093506262677817321752602598e-4932

64位的结果(MinGW64 -TDM 4.6)

64-bit outcome (MinGW64-TDM 4.6)

FLT_MAX  =340282346638528860000000000000000000000
DBL_MAX  =1.7976931348623157e+308
LDBL_MAX =1.132619801677474e-317
FLT_MIN  =1.1754943508222875e-038
DBL_MIN  =2.2250738585072014e-308
LDBL_MIN =1.132619801677474e-317

感谢。

:使用最新的MinGW64-TGM 4.7.1,LDBL_MAX,LDBL_MIN的错误已删除。

: Using the latest MinGW64-TGM 4.7.1, the "bugs" of LDBL_MAX, LDBL_MIN seems removed.

推荐答案

LDBL_MAX = 1.132619801677474e-317 某处。这是标准的要求,每个可表示为 double 的值也可以表示为 long double ,因此不允许 LDBL_MAX < DBL_MAX 。在考虑到你还没有表现出你真正的测试代码,我个人会检查指责编译器之前。

LDBL_MAX =1.132619801677474e-317 sounds like a bug somewhere. It's a requirement of the standard that every value representable as a double can also be represented as a long double, so it's not permissible for LDBL_MAX < DBL_MAX. Given that you haven't shown your real testing code, I personally would check that before blaming the compiler.

如果真的有(无缺陷)的差异长双两者之间,那么这种差异的基础上,将是您的32位编译器使用旧的x87浮点运算,具有80位精度,从而允许一个80位的 long double

If there really is a (non-bug) difference in long double between the two, then the basis of that difference will be that your 32-bit compiler uses the older x87 floating point operations, which have 80 bit precision, and hence allow for an 80-bit long double.

您的64位编译器使用较新的64位浮点运算x64。没有80位精度,它不打扰切换到x87指令来实现一个更大的 long double

Your 64-bit compiler uses the newer 64-bit floating point operations in x64. No 80-bit precision, and it doesn't bother switching to x87 instructions to implement a bigger long double.

这可能比它更复杂。例如,不是所有的x86编译器必须有一个80位 long double 。它们如何做出决定取决于各种事情,可能包括SSE2具有64位浮点运算的事实。但是可能性是 long double double 的大小相同,或者更大。

There's probably more complication to it than that. For example not all x86 compilers necessarily have an 80-bit long double. How they make that decision depends on various things, possibly including the fact that SSE2 has 64-bit floating point ops. But the possibilities are that long double is the same size as double, or that it's bigger.


为什么64位版本不能像32位版本
那样扩展数字以填充40精度输出?

why 64-bit version cannot expand as much digits as in 32-bit version to fill the "40" precision output?

double只有大约15个精度的十进制数字。

A double only has about 15 decimal digits of precision. Digits beyond that are sometimes informative, but usually misleading.

我不记得标准对 setprecision 的说法, ,但假设实现被允许画一条线在那里停止生成数字,一个的双精度是一个合理的地方画出来。至于为什么一个实现决定实际做,而另一个没有 - 我不知道。因为他们是不同的分布,他们可能会使用完全不同的标准库。

I can't remember what the standard says about setprecision, but assuming the implementation is allowed to draw a line where it stops generating digits, the precision of a double is a reasonable place to draw it. As for why one implementation decided to actually do it and the other didn't -- I don't know. Since they're different distributions, they might be using completely different standard libraries.

同样的伪精确就是为什么你看到 340282346638528859811704183484516925440 用于FLT_MAX,但在其他情况下 340282346638528860000000000000000000000 。一个编译器(或者更确切地说,一个库实现)遇到麻烦,计算大量的数字。另一个人早就放弃了。

The same "spurious precision" is why you see 340282346638528859811704183484516925440 for FLT_MAX in one case, but 340282346638528860000000000000000000000 in the other. One compiler (or rather, one library implementation) has gone to the trouble to calculate lots of digits. The other has given up early and rounded.

这篇关于C ++ 32位vs 64位浮动极限的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆