在C ++中处理非常小的数字 [英] Dealing with very small numbers in C++

查看：107 发布时间：2019/6/11 0:26:35 C++

本文介绍了在C ++中处理非常小的数字的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

0
下来投票

最爱

我处理的代码使用非常少量的订单10 ^ -15到10 ^ -25，i尝试使用双倍和长双，但我得到一个错误的答案，因为0.000000000000000000001舍入为0或像0.00000000000000002这样的数字表示为0.00000000000000001999999999999，因为即使是1/1000000的一小部分在我的最终答案中有显着差异，请建议我适当的解决方案。谢谢

我尝试过：

  #include   <   iostream  >  
   #include   <   math.h  >  
   #include   <   stdlib.h  >  
   #include   <   iomanip  >  
  使用 命名空间标准; 
  int  main（）
 {
  double  sum，a， b，c，d; 
 a =  1 ; 
b =  1  * pow（ 10 ， -   15 ）; 
c =  2  * pow（ 10 ， -   14 ）; 
d =  3  * pow（ 10 ， -   14 ）; 
 sum = a + b + c + d; 
 cout<< fixed; 
 cout<< setprecision（ 30 ）; 
 cout<<   a：<< a<< endl< <   b：<< b<< endl<<   c：<< c<< endl 
<<   d：<< d<< endl; 
 cout<<   sum：<< sum<< endl< < ENDL; 
 a = a / sum; 
 b = b / sum; 
 c = c / sum; 
 d = d / sum; 
 sum = a + b + c + d; 
 cout<<   a：<< a<< endl< <   b：<< b<< endl<<   c：<< c<< endl 
<<   d：<< d<< endl; 
 cout<<   sum2：<< sum<< ENDL; 
  return   0 ; 
}

预期产量应为

a：1.000000000000000000000000000000

b：0.000000000000000000000000000000000

c：0.000000000000020000000000000000

d：0.000000000000030000000000000000

总和：1.000000000000051000000000000000

a ：1.000000000000000000000000000000

b：0.00000000000000100000000000000000

c：0.000000000000020000000000000000

d：0.000000000000030000000000000000

sum1：1.000000000000051000000000000000

但是，我得到的输出是

a：1.000000000000000000000000000000

b：0.00000000000000100000000000000000

c：0.000000000000020000000000000000

d：0.000000000000029999999999999998

总和：1.000000000000051100000000000000

a：0.999999999999998787999878998887

b：0.000000000000000999999997897899

c：0.000000000000019999999999999458

d：0.000000000000029999999999996589

sum1：0.999999999999989000000000000000

我尝试过double，long double甚至boost_dec_float，但我得到的输出类似。

解决方案

大多数浮点值无法准确表示。结果，存储的值与实际值略有不同。使用双精度，可以表示16位有效十进制数字。这意味着从第一个非零数字计数，所有数字超过16个位置是不相关的（打印时随机）。

现在执行具有浮点值的操作，引入了更大的错误。例如，如果您将较小的值添加到较大的值，则生成的精度由较大的值定义：
 1.000 000 000 000 000 000 
 + 0.000 000 000 000 001 xxx yyy 
 = 1.000 000 000 000 001 zzz 
上例中标有x和y的数字将丢失，z成为随机（上例中为零）。

对结果执行更多操作会增加错误。这就是你所看到的。

当错误太大时唯一的解决方案是使用更精确的数字格式。虽然可能会使用 long double ，但您应该检查您的平台是否支持它。例如Microsoft Visual Studio不使用 long double （ long double 类型实际上是 double ）。

您还应该了解（并可能使用）浮点数的scientifc格式。

您可以用它来代替 pow（）来电：
 // b = 1个* POW（10，-15）; 
 // c = 2 * pow（10，-14）; 
 // d = 3 * pow（10，-14）; 
 b = 1e-15; 
 c = 2e-14; 
 d = 3e-14; 
使用 printf 函数打印值时也可以使用它。这样的输出通常比许多尾随或前导零更好的可读性。特别是 G 格式很有用（它仅对小数和大数字使用科学格式）：
 printf（  d：％。16G \ n，d）; 
在上面的示例中，精度限制为16位，因此不会打印不相关的数字。

所以你应该尝试使用上面的格式你的计划。如果结果是预期的那样（因为不打印不相关的数字而输出是舍入的），一切都没问题。

看看 boost的float128 - 1.63.0 [ ^ ]

0 down vote
favorite
Im dealing with a code which uses very small numbers of order 10^-15 to 10^-25, i tried using double and long double but i get a wrong answer as either 0.000000000000000000001 is rounded off to 0 or a number like 0.00000000000000002 is represented as 0.00000000000000001999999999999, as even a small fraction of 1/1000000 makes a significant difference in my final answers, please suggest me an appropriate fix. Thank you

What I have tried:

#include <iostream>
     #include<math.h>
     #include<stdlib.h>
     #include<iomanip>
     using namespace std;
     int main()
     {
        double  sum, a, b, c,d;
        a=1;
        b=1*pow(10,-15);
        c=2*pow(10,-14);
        d=3*pow(10,-14);
        sum=a+b+c+d;
        cout<<fixed;
        cout<<setprecision(30);
        cout<<" a   : "<<a<<endl<<" b   : "<<b<<endl<<" c   : "<<c<<endl
            <<" d   : "<<d<<endl; 
        cout<<" sum : "<<sum<<endl<<endl;
        a=a/sum;
        b=b/sum;
        c=c/sum;
        d=d/sum;
        sum=a+b+c+d;
        cout<<" a   : "<<a<<endl<<" b   : "<<b<<endl<<" c   : "<<c<<endl
            <<" d   : "<<d<<endl; 
        cout<<" sum2: "<<sum<< endl;
        return 0;
}

The expected output should be
a : 1.000000000000000000000000000000
b : 0.000000000000001000000000000000
c : 0.000000000000020000000000000000
d : 0.000000000000030000000000000000
sum : 1.000000000000051000000000000000

a : 1.000000000000000000000000000000
b : 0.000000000000001000000000000000
c : 0.000000000000020000000000000000
d : 0.000000000000030000000000000000
sum1: 1.000000000000051000000000000000

But, the output i get is
a : 1.000000000000000000000000000000
b : 0.000000000000001000000000000000
c : 0.000000000000020000000000000000
d : 0.000000000000029999999999999998
sum : 1.000000000000051100000000000000

a : 0.999999999999998787999878998887
b : 0.000000000000000999999997897899
c : 0.000000000000019999999999999458
d : 0.000000000000029999999999996589
sum1: 0.999999999999989000000000000000
I tried double, long double and even boost_dec_float, but the output which i get is similar.

解决方案

Most floating point values can not be represented exactly. As a result the stored values differ slightly from the real values. With double precision, 16 significant decimal digits can be represented. That means counting from the first non-zero digit, all digits being more than 16 positions to the right are not relevant (random when printed out).

When performing now operations with floating point values, bigger errors are introduced. If you for example add a smaller value to a larger one, the resulting precision is defined by the larger one:
  1.000 000 000 000 000 000
+ 0.000 000 000 000 001 xxx yyy
= 1.000 000 000 000 001 zzz
The digits marked with x and y in the above example will be lost and z becomes random (zero in the above example).

Performing more operations with the result will increase the errors. That is what you are seeing.

The only solution when the errors are too large is using a more precise number format. While long double might be used, you should check if it is supported on your platform. Microsoft Visual Studio for example does not use long double (the long double type is in fact a double).

You should also know about (and probably use) the scientifc format for floating point numbers.

You can use it for example to replace the pow() calls:
//b=1*pow(10,-15);
//c=2*pow(10,-14);
//d=3*pow(10,-14);
b = 1e-15;
c = 2e-14;
d = 3e-14;
It can be also used when printing values using the printf function. Such output is often better readable than a lot of trailing or leading zeroes. Especially the G format is useful (it will use the scientific format only for small and large numbers):
printf("d: %.16G\n", d);
In the above example the precision is limited to 16 digits so that non-relevant digits won't be printed.

So you should try to use the above formatting within your program. If the results are as expected then (because non-relevant digits are not printed and the output is rounded instead), all is OK.

Have a look at boost's float128 - 1.63.0[^].

这篇关于在C ++中处理非常小的数字的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在C ++中处理非常小的数字 [英] Dealing with very small numbers in C++

问题描述

相关文章

其他开发语言最新文章

热门教程

热门工具

登录关闭

在C ++中处理非常小的数字 [英] Dealing with very small numbers in C++

问题描述

相关文章

其他开发语言最新文章

热门教程

热门工具

登录 关闭

登录关闭