使用%d在Awk程序中给出了奇怪的舍入值 [英] use of %d is giving strange rounding values in Awk program
问题描述
当我对某些记录集求和时,我得到一个奇怪的答案. 在一种情况下,我没有使用%d,在另一种情况下,我正在使用%d
I am getting strange answer when I am performing sum on certain set of records. in one case i am not using the %d and in the next case i am using the %d
使用%d的总和的第一个表达式
the first expression of sum of using %d
awk -F"|" '(NR > 0 && NR < 36) {sum +=$150} END {printf ("%d\n",sum)}' muar.txt
-|33
没有%d
awk -F"|" '(NR > 0 && NR < 36) {sum +=$150} END {printf ("\n"sum)}' muar.txt
-|34
为什么要从34舍入到33
Why it is rounding to 33 from 34
只需添加更多信息,直到第34行,我的总和为33.03,第35行的值为0.97,因此实际上应该是34而不是33
Just to add more Info, till 34 row I am getting sum as 33.03 and the 35th row has value 0.97 so actually it should be 34 rather than 33
根据测试注释的其他详细信息-您可以创建文件a.txt 只有一个领域.第一个值是空白,第二个是1.95,然后是18乘以097,然后是0.98,然后是6乘以0.97,然后是0.98,然后是3乘以0.97,然后是0.98,然后是2乘以2,然后是0.97
Additional Detail as per Comments for Testing -you can create a file let's a.txt having Only One Field. the first value is blank second one is 1.95 then 18 times 097 in a row, then 0.98 then 6 times 0.97 then 0.98 then 3 times 0.97 then 0.98 2 times then 2 times 0.97
或者您可以连续获得1.95-1倍,0.97-29倍和0.98 4倍于所有其他
Or You can have 1.95 - 1 time , 0.97 - 29 times, and 0.98 4 times all one below other in a row
推荐答案
您的问题的答案是两倍:
- 有数字问题
-
awk
进行一些内部转换
- There is a numeric problem
awk
does some internal conversion
您的示例之一是:1.95 + 29 * 0.97 + 4 * 0.98.我们都可以同意,该值的总和为34.下面的小"awk程序"以两种不同的方式进行计算,从而得出了显着的结果:
One of your examples was : 1.95 + 29*0.97 + 4*0.98. We can all agree that the sum of this value is 34 exactly. The little `awk program below, does the computation in two different ways leading to remarkable results :
awk 'BEGIN{sum1=1.95 + 29*0.97 + 4*0.98
sum2=1.95;
for(i=1;i<=29;i++){sum2+=0.97};
for(i=1;i<=4;i++) {sum2+=0.98};
printf "full precision : %25.16f%25.16f\n",sum1,sum2
printf "integer conversion : %25d%25d\n" ,sum1,sum2
printf "string conversion : "sum1" "sum2"\n"
}'
会导致以下输出(第一列sum1
第二列sum2
which leads to the following output (first column sum1
second column sum2
full precision : 34.0000000000000000 33.9999999999999787
integer conversion : 34 33
string conversion : 34 34
为什么两个总和的结果不同:
本质上,3个数字1.95
,0.97
和0.98
不能以二进制格式表示.出现一个近似值,表示为:
In essence, the 3 numbers 1.95
, 0.97
and 0.98
cannot be represented in a binary format. An approximation occurs which represents them as
1.95 ~ 1.94999999999999995559107901499...
0.97 ~ 0.96999999999999997335464740899...
0.98 ~ 0.97999999999999998223643160599...
按sum2
的方式求和时,33个加法器的误差会增加,并导致最终结果:
when summing them as is done according to sum2
, the errors of the 33 additions grows and leads to the final result :
sum2 = 33.99999999999997868371792719699...
sum1
上的错误比sum2
小得多,因为我们只进行2次乘法和2次加法.实际上,错误会蒸发到正确的结果(即,10^-17
的错误较小):
The error on sum1
is much smaller than sum2
as we only do 2 multiplications and 2 additions. In fact, the error evaporates to the correct result (i.e. the error is smaller the 10^-17
):
1.95 ~ 1.94999999999999995559107901499...
29*0.97 ~ 28.12999999999999900524016993586...
4*0.98 ~ 3.91999999999999992894572642399...
sum1 ~ 34.00000000000000000000000000000...
For a detailed understanding of the above, I refer to the obligatory article What Every Computer Scientist Should Know About Floating-Point Arithmetic
打印语句发生了什么事?
awk
本质上是在进行内部转换:
awk
is essentially doing internal conversions:
-
printf "%d"
请求一个整数,但是它是浮点数.awk
正在接收sum2
并通过除去数字的小数部分将其转换为整数,或者您可以想象它通过int()
馈入它,因此33.99999...
被转换为33
.
printf "%d"
requests an integer, but it is served a float.awk
is receivingsum2
and converts it to an integer by removing the fractional part of the number, or you could imagine it feeds it troughint()
Thus33.99999...
is converted to33
.
printf ""sum2
,这是从浮点数到字符串的转换.本质上,通过将字符串连接到数字,必须将数字转换为字符串.如果数字是纯整数,则将其转换为纯整数.但是,sum2
是浮点数.
printf ""sum2
, this is a conversion from a float to a string. Essentially by concatenating a string to a number, the number has to be converted in a string. If the number is a pure integer, it will just convert it as a pure integer. However, sum2
is a float.
将sum2
转换为字符串是在内部使用sprintf(CONVFMT,sum2)
完成的,其中CONVFMT
是设置为%.6g
的awk内置变量.因此,sum2
默认情况下会四舍五入,以最多6个十进制数字表示.因此""sum2 -> "34"
.
The conversion of sum2
to a string is internally done with sprintf(CONVFMT,sum2)
where CONVFMT
is an awk built-in variable which is set to %.6g
. Thus sum2
is by default rounded to be represented with a maximum of 6 decimal digits. Hence ""sum2 -> "34"
.
我们可以改善sum2
:
Can we improve sum2
:
是的! sum2
只不过是我们要添加的数字序列的表示.首先搜索所有通用术语并像sum1
那样使用乘数实际上是不实际的.使用 Kahan Summation 可以实现改进.其背后的想法是跟踪代表您丢失数字的补偿项.
Yes! sum2
is nothing more than a representation of a sequence of numbers we want to add. It is not really practical to search for all the common terms first and the use multiplications as is done in sum1
. An improvement can be achieved using Kahan Summation. The idea behind it is to keep track of a compensation term representing the digits you lost.
以下程序对此进行了演示:
The following program demonstrates it:
awk 'BEGIN{sum2=1.95;
for(i=1;i<=29;i++){sum2+=0.97};
for(i=1;i<=4;i++) {sum2+=0.98};
sum3=1.95; c=0
for(i=1;i<=29;i++) { y = 0.97 - c; t = sum3 + y; c = (t - sum3) - y; sum3 = t }
for(i=1;i<=4;i++) { y = 0.98 - c; t = sum3 + y; c = (t - sum3) - y; sum3 = t }
printf "full precision : %25.16f%25.16f\n",sum2,sum3
printf "integer conversion : %25d%25d\n" ,sum2,sum3
printf "string conversion : "sum2" "sum3"\n"
}'
这将导致以下输出(第一列sum2第二列sum3)
which leads to the following output (first column sum2 second column sum3)
full precision : 33.9999999999999787 34.0000000000000000
integer conversion : 33 34
string conversion : 34 34
如果要查看中间步骤以及sum2
和sum3
之间的区别,可以查看以下代码.
If you want to see the intermediate steps and difference between sum2
and sum3
you can check out the following code.
awk 'BEGIN{ sum2=sum3=1.95;c=0;
for(i=1;i<=29;i++) {
sum2+=0.97;
y = 0.97 - c; t = sum3 + y; c = (t - sum3) - y; sum3 = t;
printf "%25.16f%25.16f%25.16e\n", sum2,sum3,c
}
for(i=1;i<=4;i++) {
sum2+=0.98;
y = 0.98 - c; t = sum3 + y; c = (t - sum3) - y; sum3 = t;
printf "%25.16f%25.16f%25.16e\n", sum2,sum3,c
}
}'
这篇关于使用%d在Awk程序中给出了奇怪的舍入值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!