使用%d在Awk程序中给出了奇怪的舍入值 [英] use of %d is giving strange rounding values in Awk program

查看:97
本文介绍了使用%d在Awk程序中给出了奇怪的舍入值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我对某些记录集求和时,我得到一个奇怪的答案. 在一种情况下,我没有使用%d,在另一种情况下,我正在使用%d

I am getting strange answer when I am performing sum on certain set of records. in one case i am not using the %d and in the next case i am using the %d

使用%d的总和的第一个表达式

the first expression of sum of using %d

 awk -F"|" '(NR > 0 && NR < 36) {sum +=$150} END {printf ("%d\n",sum)}' muar.txt
-|33

没有%d

 awk -F"|" '(NR > 0 && NR < 36) {sum +=$150} END {printf ("\n"sum)}' muar.txt
-|34

为什么要从34舍入到33

Why it is rounding to 33 from 34

只需添加更多信息,直到第34行,我的总和为33.03,第35行的值为0.97,因此实际上应该是34而不是33

Just to add more Info, till 34 row I am getting sum as 33.03 and the 35th row has value 0.97 so actually it should be 34 rather than 33

根据测试注释的其他详细信息-您可以创建文件a.txt 只有一个领域.第一个值是空白,第二个是1.95,然后是18乘以097,然后是0.98,然后是6乘以0.97,然后是0.98,然后是3乘以0.97,然后是0.98,然后是2乘以2,然后是0.97

Additional Detail as per Comments for Testing -you can create a file let's a.txt having Only One Field. the first value is blank second one is 1.95 then 18 times 097 in a row, then 0.98 then 6 times 0.97 then 0.98 then 3 times 0.97 then 0.98 2 times then 2 times 0.97

或者您可以连续获得1.95-1倍,0.97-29倍和0.98 4倍于所有其他

Or You can have 1.95 - 1 time , 0.97 - 29 times, and 0.98 4 times all one below other in a row

推荐答案

您的问题的答案是两倍:

  • 有数字问题
  • awk进行一些内部转换
  • There is a numeric problem
  • awk does some internal conversion

您的示例之一是:1.95 + 29 * 0.97 + 4 * 0.98.我们都可以同意,该值的总和为34.下面的小"awk程序"以两种不同的方式进行计算,从而得出了显着的结果:

One of your examples was : 1.95 + 29*0.97 + 4*0.98. We can all agree that the sum of this value is 34 exactly. The little `awk program below, does the computation in two different ways leading to remarkable results :

awk 'BEGIN{sum1=1.95 + 29*0.97 + 4*0.98
           sum2=1.95;
           for(i=1;i<=29;i++){sum2+=0.97};
           for(i=1;i<=4;i++) {sum2+=0.98};

           printf "full precision     : %25.16f%25.16f\n",sum1,sum2
           printf "integer conversion : %25d%25d\n"      ,sum1,sum2
           printf "string conversion  : "sum1" "sum2"\n"
}'

会导致以下输出(第一列sum1第二列sum2

which leads to the following output (first column sum1 second column sum2

full precision     :       34.0000000000000000      33.9999999999999787
integer conversion :                        34                       33
string conversion  : 34 34

为什么两个总和的结果不同:

本质上,3个数字1.950.970.98不能以二进制格式表示.出现一个近似值,表示为:

In essence, the 3 numbers 1.95, 0.97 and 0.98 cannot be represented in a binary format. An approximation occurs which represents them as

1.95 ~ 1.94999999999999995559107901499...
0.97 ~ 0.96999999999999997335464740899...
0.98 ~ 0.97999999999999998223643160599...

sum2的方式求和时,33个加法器的误差会增加,并导致最终结果:

when summing them as is done according to sum2, the errors of the 33 additions grows and leads to the final result :

sum2 = 33.99999999999997868371792719699...

sum1上的错误比sum2小得多,因为我们只进行2次乘法和2次加法.实际上,错误会蒸发到正确的结果(即,10^-17的错误较小):

The error on sum1 is much smaller than sum2 as we only do 2 multiplications and 2 additions. In fact, the error evaporates to the correct result (i.e. the error is smaller the 10^-17):

   1.95 ~  1.94999999999999995559107901499...
29*0.97 ~ 28.12999999999999900524016993586...
 4*0.98 ~  3.91999999999999992894572642399...
   sum1 ~ 34.00000000000000000000000000000...

要详细了解上述内容,请参阅强制性文章

For a detailed understanding of the above, I refer to the obligatory article What Every Computer Scientist Should Know About Floating-Point Arithmetic

打印语句发生了什么事?

awk本质上是在进行内部转换:

awk is essentially doing internal conversions:

  • printf "%d"请求一个整数,但是它是浮点数. awk正在接收sum2并通过除去数字的小数部分将其转换为整数,或者您可以想象它通过int()馈入它,因此33.99999...被转换为33.

  • printf "%d" requests an integer, but it is served a float. awk is receiving sum2 and converts it to an integer by removing the fractional part of the number, or you could imagine it feeds it trough int() Thus 33.99999... is converted to 33.

printf ""sum2,这是从浮点数到字符串的转换.本质上,通过将字符串连接到数字,必须将数字转换为字符串.如果数字是纯整数,则将其转换为纯整数.但是,sum2是浮点数.

printf ""sum2, this is a conversion from a float to a string. Essentially by concatenating a string to a number, the number has to be converted in a string. If the number is a pure integer, it will just convert it as a pure integer. However, sum2 is a float.

sum2转换为字符串是在内部使用sprintf(CONVFMT,sum2)完成的,其中CONVFMT是设置为%.6g的awk内置变量.因此,sum2默认情况下会四舍五入,以最多6个十进制数字表示.因此""sum2 -> "34".

The conversion of sum2 to a string is internally done with sprintf(CONVFMT,sum2) where CONVFMT is an awk built-in variable which is set to %.6g. Thus sum2 is by default rounded to be represented with a maximum of 6 decimal digits. Hence ""sum2 -> "34".

我们可以改善sum2:

Can we improve sum2:

是的! sum2只不过是我们要添加的数字序列的表示.首先搜索所有通用术语并像sum1那样使用乘数实际上是不实际的.使用 Kahan Summation 可以实现改进.其背后的想法是跟踪代表您丢失数字的补偿项.

Yes! sum2 is nothing more than a representation of a sequence of numbers we want to add. It is not really practical to search for all the common terms first and the use multiplications as is done in sum1. An improvement can be achieved using Kahan Summation. The idea behind it is to keep track of a compensation term representing the digits you lost.

以下程序对此进行了演示:

The following program demonstrates it:

awk 'BEGIN{sum2=1.95;
           for(i=1;i<=29;i++){sum2+=0.97};
           for(i=1;i<=4;i++) {sum2+=0.98};
           sum3=1.95; c=0
           for(i=1;i<=29;i++) { y = 0.97 - c; t = sum3 + y; c = (t - sum3) - y; sum3 = t }
           for(i=1;i<=4;i++)  { y = 0.98 - c; t = sum3 + y; c = (t - sum3) - y; sum3 = t }

           printf "full precision     : %25.16f%25.16f\n",sum2,sum3
           printf "integer conversion : %25d%25d\n"      ,sum2,sum3
           printf "string conversion  : "sum2" "sum3"\n"
}'

这将导致以下输出(第一列sum2第二列sum3)

which leads to the following output (first column sum2 second column sum3)

full precision     :       33.9999999999999787      34.0000000000000000
integer conversion :                        33                       34
string conversion  : 34 34

如果要查看中间步骤以及sum2sum3之间的区别,可以查看以下代码.

If you want to see the intermediate steps and difference between sum2 and sum3 you can check out the following code.

 awk 'BEGIN{ sum2=sum3=1.95;c=0;
             for(i=1;i<=29;i++) {
                sum2+=0.97;
                y = 0.97 - c; t = sum3 + y; c = (t - sum3) - y; sum3 = t;
                printf "%25.16f%25.16f%25.16e\n", sum2,sum3,c
             }
             for(i=1;i<=4;i++) {
                sum2+=0.98;
                y = 0.98 - c; t = sum3 + y; c = (t - sum3) - y; sum3 = t;
                printf "%25.16f%25.16f%25.16e\n", sum2,sum3,c
             }
      }'

这篇关于使用%d在Awk程序中给出了奇怪的舍入值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆