float和double有什么区别? [英] What is the difference between float and double?

查看:135
本文介绍了float和double有什么区别?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经读过双精度和单精度之间的区别.但是,在大多数情况下,floatdouble似乎是可互换的,即使用一个或另一个似乎不影响结果.真的是这样吗?花车和双打什么时候可以互换?它们之间有什么区别?

I've read about the difference between double precision and single precision. However, in most cases, float and double seem to be interchangeable, i.e. using one or the other does not seem to affect the results. Is this really the case? When are floats and doubles interchangeable? What are the differences between them?

推荐答案

差异很大.

顾名思义, double 的精度是 float [1] .通常,double的精度为15个十进制数字,而float的精度为7个.

As the name implies, a double has 2x the precision of float[1]. In general a double has 15 decimal digits of precision, while float has 7.

以下是数字位数的计算方式:

Here's how the number of digits are calculated:

double具有52个尾数位+ 1个隐藏位:log(2 53 )÷log(10)= 15.95位

double has 52 mantissa bits + 1 hidden bit: log(253)÷log(10) = 15.95 digits

float有23个尾数位+ 1个隐藏位:log(2 24 )÷log(10)= 7.22位

float has 23 mantissa bits + 1 hidden bit: log(224)÷log(10) = 7.22 digits

例如,重复进行计算时,这种精度损失可能导致更大的截断误差累积.

This precision loss could lead to greater truncation errors being accumulated when repeated calculations are done, e.g.

float a = 1.f / 81;
float b = 0;
for (int i = 0; i < 729; ++ i)
    b += a;
printf("%.7g\n", b); // prints 9.000023

同时

double a = 1.0 / 81;
double b = 0;
for (int i = 0; i < 729; ++ i)
    b += a;
printf("%.15g\n", b); // prints 8.99999999999996

此外,float的最大值约为3e38,但double约为1.7e308,因此使用float可以比double更容易达到无穷大"(即特殊的浮点数)对于简单的事情,例如计算60的阶乘.

Also, the maximum value of float is about 3e38, but double is about 1.7e308, so using float can hit "infinity" (i.e. a special floating-point number) much more easily than double for something simple, e.g. computing the factorial of 60.

在测试过程中,可能有几个测试用例包含这些庞大的数字,如果您使用浮点数,则可能导致程序失败.

During testing, maybe a few test cases contain these huge numbers, which may cause your programs to fail if you use floats.

当然,有时甚至double都不足够准确,因此有时我们会有long double [1] (上面的示例在Mac上为9.000000000000000066),但是所有浮点类型遭受舍入误差的困扰,因此,如果精度非常重要(例如,货币处理),则应使用int或分数类.

Of course, sometimes, even double isn't accurate enough, hence we sometimes have long double[1] (the above example gives 9.000000000000000066 on Mac), but all floating point types suffer from round-off errors, so if precision is very important (e.g. money processing) you should use int or a fraction class.

此外,请勿使用+=求和大量的浮点数,因为错误会迅速累积.如果您使用的是Python,请使用fsum.否则,请尝试实施 Kahan求和算法.

Furthermore, don't use += to sum lots of floating point numbers, as the errors accumulate quickly. If you're using Python, use fsum. Otherwise, try to implement the Kahan summation algorithm.

[1]:C和C ++标准未指定floatdoublelong double的表示.所有这三个都可能实现为IEEE双精度.但是,对于大多数体系结构(gcc,MSVC,x86,x64,ARM),float 确实是,是IEEE单精度浮点数(binary32),而double IEEE双精度浮点数(binary64).

[1]: The C and C++ standards do not specify the representation of float, double and long double. It is possible that all three are implemented as IEEE double-precision. Nevertheless, for most architectures (gcc, MSVC; x86, x64, ARM) float is indeed a IEEE single-precision floating point number (binary32), and double is a IEEE double-precision floating point number (binary64).

这篇关于float和double有什么区别?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆