在32个十进制数字浮点/双精度precision分析 [英] Analysis of float/double precision in 32 decimal digits

查看:159
本文介绍了在32个十进制数字浮点/双精度precision分析的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

从另一个男人.c文件,我看到了这一点:

From a .c file of another guy, I saw this:

const float c = 0.70710678118654752440084436210485f;

在那里,他希望避免的开方运算

(1/2)

这能真的不知何故与普通的 C / C ++ 存储?我的意思是没有松动的precision。这似乎是不可能给我。

Can this be really stored somehow with plain C/C++? I mean without loosing precision. It seems impossible to me.

我使用C ++,但我不相信,precision差异之间的两种语言是太大了(如果有的话),这就是为什么我没有测试它。

I am using C++, but I do not believe that precision difference between this two languages are too big (if any), that' why I did not test it.

所以,我写这几行,看看在code的行为:

So, I wrote these few lines, to have a look at the behaviour of the code:

std::cout << "Number:    0.70710678118654752440084436210485\n";

const float f = 0.70710678118654752440084436210485f;
std::cout << "float:     " << std::setprecision(32) << f << std::endl;

const double d = 0.70710678118654752440084436210485; // no f extension
std::cout << "double:    " << std::setprecision(32) << d << std::endl;

const double df = 0.70710678118654752440084436210485f;
std::cout << "doublef:   " << std::setprecision(32) << df << std::endl;

const long double ld = 0.70710678118654752440084436210485;
std::cout << "l double:  " << std::setprecision(32) << ld << std::endl;

const long double ldl = 0.70710678118654752440084436210485l; // l suffix!
std::cout << "l doublel: " << std::setprecision(32) << ldl << std::endl;

的输出是这样的:

The output is this:

                   *       ** ***
                   v        v v
Number:    0.70710678118654752440084436210485    // 32 decimal digits
float:     0.707106769084930419921875            // 24 >>      >>
double:    0.70710678118654757273731092936941
doublef:   0.707106769084930419921875            // same as float
l double:  0.70710678118654757273731092936941    // same as double
l doublel: 0.70710678118654752438189403651592    // suffix l

其中, * 是最后的准确数字浮动 ** 的最后一个准确的数字双击 *** 的最后一个准确的数字长双

where * is the last accurate digit of float, ** the last accurate digit of double and *** the last accurate digit of long double.

的输出双击有32个十进制数字,因为我已设置的std :: COUT 在该值。

The output of double has 32 decimal digits, since I have set the precision of std::cout at that value.

浮动输出有24个,如预期,正如所说<一个href=\"http://stackoverflow.com/questions/19292283/0-1-float-is-greater-than-0-1-double-i-expected-it-to-be-false\">here:

float output has 24, as expected, as said here:

float has 24 binary bits of precision, and double has 53.

我希望最后的输出为与pre-去年相同,即该˚F后缀不会$ P $成为pvent数一个双击。我的认为的,当我写这篇文章:

I would expect the last output to be the same with the pre-last, i.e. that the f suffix would not prevent the number from becoming a double. I think that when I write this:

const double df = 0.70710678118654752440084436210485f;

什么情况是,第一个数字就变成了浮动一个,然后保存为一个双击,打完24日十进制数字,它具有零,这就是为什么在双击 precision停在那里。

what happens is that first the number becomes a float one and then stored as a double, so after the 24th decimal digits, it has zeroes and that's why the double precision stops there.

我是正确的?

从<一个href=\"http://stackoverflow.com/questions/5199338/what-is-the-significance-of-0-0f-when-initializing-in-c\">this答案我找到了一些相关的信息:

From this answer I found some relevant information:

float x = 0 has an implicit typecast from int to float.
float x = 0.0f does not have such a typecast.
float x = 0.0 has an implicit typecast from double to float.

关于 __ float128 ,它不是标准的,因此它的退出了竞争。查看更多<一个href=\"http://stackoverflow.com/questions/23654693/print-float128-without-using-quadmath-snprintf\">here.

About __float128, it is not standard, thus it's out of the competition. See more here.

推荐答案

从标准:

有三种浮动点类型:浮动,双,和长双。
  double类型提供至少高达precision为float,以及
  长型双提供至少高达precision为双。该
  float类型的值的集合是一组值中的一个子集
  double类型;设置double类型的值的一个子集
  集合类型长双值的。再presentation价值
  FL浮点类型是实现德网络定义。

There are three floating point types: float, double, and long double. The type double provides at least as much precision as float, and the type long double provides at least as much precision as double. The set of values of the type float is a subset of the set of values of the type double; the set of values of the type double is a subset of the set of values of the type long double. The value representation of floating-point types is implementation-defined.

所以,你可以看到你的问题,这个问题:标准实际上并没有说precise彩车如何。

So you can see your issue with this question: the standard doesn't actually say how precise floats are.

在标准实现方面,你需要看看IEEE754,这意味着从Irineau和Davidmh其他两个答案是完全有效的方法来解决问题。

In terms of standard implementations, you need to look at IEEE754, which means the other two answers from Irineau and Davidmh are perfectly valid approaches to the problem.

至于后缀字母来表示类型,再望着标准:

As to suffix letters to indicate type, again looking at the standard:

A型浮动文字的两倍,除非明确特定网络版
  一肃FFI的X.苏FFI XES F和F指定浮动,苏FFI XES L和L指定
  长双。

The type of a floating literal is double unless explicitly specified by a suffix. The suffixes f and F specify float, the suffixes l and L specify long double.

所以,你试图创建一个长双将只具有相同的precision为双击字面除非您使用后缀你分配给它。

So your attempt to create a long double will just have the same precision as the double literal you are assigning to it unless you use the L suffix.

据我所知,有些答案似乎不尽如人意,但有很多阅读的背景要对相关标准进行之前,你可以辞退的答案。这个答案已经超过预期,所以我不会尝试在这里解释一切。

I understand that some of these answers may not seem satisfactory, but there is a lot of background reading to be done on the relevant standards before you can dismiss answers. This answer is already longer than intended so I won't try and explain everything here.

和作为最后需要注意:由于precision没有明确的规定,为什么没有一个恒定的那长于它需要?似乎很有道理始终定义一个常量,它是precise足以随时重新presentable与类型无关。

And as a final note: Since the precision is not clearly defined, why not have a constant that's longer than it needs to be? Seems to make sense to always define a constant that is precise enough to always be representable regardless of type.

这篇关于在32个十进制数字浮点/双精度precision分析的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆