float和double有什么区别? [英] What is the difference between float and double?
问题描述
我已经读过双精度和单精度之间的区别.但是,在大多数情况下,float
和double
似乎是可互换的,即使用一个或另一个似乎不影响结果.真的是这样吗?花车和双打什么时候可以互换?它们之间有什么区别?
I've read about the difference between double precision and single precision. However, in most cases, float
and double
seem to be interchangeable, i.e. using one or the other does not seem to affect the results. Is this really the case? When are floats and doubles interchangeable? What are the differences between them?
推荐答案
差异很大.
顾名思义, double
的精度是 float
[1] .通常,double
的精度为15个十进制数字,而float
的精度为7个.
As the name implies, a double
has 2x the precision of float
[1]. In general a double
has 15 decimal digits of precision, while float
has 7.
以下是数字位数的计算方式:
Here's how the number of digits are calculated:
double
具有52个尾数位+ 1个隐藏位:log(2 53 )÷log(10)= 15.95位
double
has 52 mantissa bits + 1 hidden bit: log(253)÷log(10) = 15.95 digits
float
有23个尾数位+ 1个隐藏位:log(2 24 )÷log(10)= 7.22位
float
has 23 mantissa bits + 1 hidden bit: log(224)÷log(10) = 7.22 digits
例如,重复进行计算时,这种精度损失可能导致更大的截断误差累积.
This precision loss could lead to greater truncation errors being accumulated when repeated calculations are done, e.g.
float a = 1.f / 81;
float b = 0;
for (int i = 0; i < 729; ++ i)
b += a;
printf("%.7g\n", b); // prints 9.000023
同时
double a = 1.0 / 81;
double b = 0;
for (int i = 0; i < 729; ++ i)
b += a;
printf("%.15g\n", b); // prints 8.99999999999996
此外,float的最大值约为3e38
,但double约为1.7e308
,因此使用float
可以比double
更容易达到无穷大"(即特殊的浮点数)对于简单的事情,例如计算60的阶乘.
Also, the maximum value of float is about 3e38
, but double is about 1.7e308
, so using float
can hit "infinity" (i.e. a special floating-point number) much more easily than double
for something simple, e.g. computing the factorial of 60.
在测试过程中,可能有几个测试用例包含这些庞大的数字,如果您使用浮点数,则可能导致程序失败.
During testing, maybe a few test cases contain these huge numbers, which may cause your programs to fail if you use floats.
当然,有时甚至double
都不足够准确,因此有时我们会有long double
[1] (上面的示例在Mac上为9.000000000000000066),但是所有浮点类型遭受舍入误差的困扰,因此,如果精度非常重要(例如,货币处理),则应使用int
或分数类.
Of course, sometimes, even double
isn't accurate enough, hence we sometimes have long double
[1] (the above example gives 9.000000000000000066 on Mac), but all floating point types suffer from round-off errors, so if precision is very important (e.g. money processing) you should use int
or a fraction class.
此外,请勿使用+=
求和大量的浮点数,因为错误会迅速累积.如果您使用的是Python,请使用fsum
.否则,请尝试实施 Kahan求和算法.
Furthermore, don't use +=
to sum lots of floating point numbers, as the errors accumulate quickly. If you're using Python, use fsum
. Otherwise, try to implement the Kahan summation algorithm.
[1]:C和C ++标准未指定float
,double
和long double
的表示.所有这三个都可能实现为IEEE双精度.但是,对于大多数体系结构(gcc,MSVC,x86,x64,ARM),float
确实是,是IEEE单精度浮点数(binary32),而double
是 IEEE双精度浮点数(binary64).
[1]: The C and C++ standards do not specify the representation of float
, double
and long double
. It is possible that all three are implemented as IEEE double-precision. Nevertheless, for most architectures (gcc, MSVC; x86, x64, ARM) float
is indeed a IEEE single-precision floating point number (binary32), and double
is a IEEE double-precision floating point number (binary64).
这篇关于float和double有什么区别?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!