float 和 double 有什么区别? [英] What is the difference between float and double?

查看:39
本文介绍了float 和 double 有什么区别?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经阅读了双精度和单精度之间的区别.然而,在大多数情况下,floatdouble 似乎是可以互换的,即使用一个或另一个似乎不会影响结果.真的是这样吗?什么时候浮动和双打可以互换?它们之间有什么区别?

I've read about the difference between double precision and single precision. However, in most cases, float and double seem to be interchangeable, i.e. using one or the other does not seem to affect the results. Is this really the case? When are floats and doubles interchangeable? What are the differences between them?

推荐答案

巨大的差异.

顾名思义,double 有 2xfloat[1].一般来说,double 有 15 位十进制数的精度,而 float 有 7 位.

As the name implies, a double has 2x the precision of float[1]. In general a double has 15 decimal digits of precision, while float has 7.

以下是计算位数的方法:

Here's how the number of digits are calculated:

double 有 52 个尾数位 + 1 个隐藏位:log(253)÷log(10) = 15.95 位

double has 52 mantissa bits + 1 hidden bit: log(253)÷log(10) = 15.95 digits

float 有 23 个尾数位 + 1 个隐藏位:log(224)÷log(10) = 7.22 位

float has 23 mantissa bits + 1 hidden bit: log(224)÷log(10) = 7.22 digits

这种精度损失可能会导致在重复计算时累积更大的截断误差,例如

This precision loss could lead to greater truncation errors being accumulated when repeated calculations are done, e.g.

float a = 1.f / 81;
float b = 0;
for (int i = 0; i < 729; ++ i)
    b += a;
printf("%.7g
", b); // prints 9.000023

同时

double a = 1.0 / 81;
double b = 0;
for (int i = 0; i < 729; ++ i)
    b += a;
printf("%.15g
", b); // prints 8.99999999999996

另外,float的最大值约为3e38,而double约为1.7e308,所以使用float可以达到无穷大";(即一个特殊的浮点数)比 double 对于一些简单的事情更容易,例如计算 60 的阶乘.

Also, the maximum value of float is about 3e38, but double is about 1.7e308, so using float can hit "infinity" (i.e. a special floating-point number) much more easily than double for something simple, e.g. computing the factorial of 60.

在测试过程中,可能有几个测试用例包含这些巨大的数字,如果您使用浮点数,可能会导致您的程序失败.

During testing, maybe a few test cases contain these huge numbers, which may cause your programs to fail if you use floats.

当然,有时,即使 double 也不够准确,因此我们有时会有 long double[1] (上面的例子在 Mac 上给出 9.000000000000000066),但所有浮点类型都存在舍入误差,因此如果精度非常重要(例如货币处理),您应该使用 int 或分数班级.

Of course, sometimes, even double isn't accurate enough, hence we sometimes have long double[1] (the above example gives 9.000000000000000066 on Mac), but all floating point types suffer from round-off errors, so if precision is very important (e.g. money processing) you should use int or a fraction class.

此外,不要使用 += 对大量浮点数求和,因为错误会迅速累积.如果您使用 Python,请使用 fsum.否则,请尝试实施 Kahan 求和算法.

Furthermore, don't use += to sum lots of floating point numbers, as the errors accumulate quickly. If you're using Python, use fsum. Otherwise, try to implement the Kahan summation algorithm.

[1]:C 和 C++ 标准没有指定 floatdoublelong double 的表示.有可能所有三个都实现为 IEEE 双精度.尽管如此,对于大多数架构(gcc、MSVC;x86、x64、ARM)float确实是一个 IEEE 单精度浮点数(binary32),double 一个IEEE双精度浮点数(binary64).

[1]: The C and C++ standards do not specify the representation of float, double and long double. It is possible that all three are implemented as IEEE double-precision. Nevertheless, for most architectures (gcc, MSVC; x86, x64, ARM) float is indeed a IEEE single-precision floating point number (binary32), and double is a IEEE double-precision floating point number (binary64).

这篇关于float 和 double 有什么区别?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆