用于最小化C中的浮点错误的经验法则？ [英] Rules-of-thumb for minimising floating-point errors in C?

查看：270 发布时间：2017/12/21 21:58:21 c floating-point 32-bit floating-accuracy

本文介绍了用于最小化C中的浮点错误的经验法则？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

关于最小化浮点运算的错误，如果我在C中有一个如下的操作：

  float a = 123.456; 
 float b = 456.789; 
 float r = 0.12345; 
 a = a  - （r * b）;

如果我将乘法和减法步骤分开，计算结果是否会发生变化，即： p>

  float c = r * b; 
 a = a  -  c;

我想知道CPU是否会以不同的方式处理这些计算， case？

如果不是，我认为无论如何，有没有什么好的经验法则来减轻浮点错误？我可以按照一种方式来按摩数据吗？

请不要只说使用更高的精确度 - 这不是我所追求的。

编辑

有关数据的信息，在一般意义上，错误似乎更糟操作会产生一个非常大的数字，如123456789。小数字（如1.23456789）在操作之后似乎会产生更准确的结果。我想象一下，还是会扩大数字帮助准确性？

解决方案

注意：这个答案开始于一个漫长的讨论之间 a = a - （r * b）; 和 float c = r * b; a = a - c; 与c99兼容的编译器。最后介绍了提高准确性，同时避免扩展精度这个问题的部分。

中间结果的扩展浮点精度

如果您的C99编译器定义 FLT_EVAL_METHOD 为0，那么这两个计算可以产生完全相同的结果。如果编译器定义 FLT_EVAL_METHOD 为1或2，那么 a = a - （r * b）; 会更多精确的 a ， r 和 b 因为所有的中间计算都将以扩展的精度完成（对于值1， double 并且对于值2， long double ）。
$ b
程序不能设置 FLT_EVAL_METHOD ，但是可以使用命令行选项来改变编译器计算的方式浮点，这将使其相应地改变其定义。

收缩一些中间结果

取决于无论你在你的程序中使用 #pragma fp_contract 还是编译器默认值为这个编译指示，一些复合浮点表达式可以被合并为单这些指令的行为就好像中间结果是以无限精度计算的。如果您使用现代处理器进行定位，那么您的示例就有可能出现这种情况，例如 fused-乘法 - 加法指令将直接和精确地计算 a 的浮点类型。

<但是，您应该记住，收缩只能由编译器选择，没有任何保证。编译器使用FMA指令来优化速度，而不是准确性，所以转换可能不会发生在较低的优化级别。有时可以进行几次转换（例如，可以将 a * b + c * d 计算为 fmaf（c，d，a * b） code>或者作为 fmaf（a，b，c * d）），编译器可以选择其中一个或者另一个。简言之，浮点运算的收缩并不是为了帮助您达到精确度。如果你喜欢可重复的结果，你也可以确保它被禁用。然而，在熔合乘加复合操作的特定情况下，可以使用C99标准函数 fmaf（）告诉编译器用单个四舍五入来计算乘法和加法。如果你这样做，那么编译器将不被允许产生除 a 的最好结果之外的其他东西。 float fmaf（float x，float y，float z）; 描述 fma（）函数计算（x * y）+ z，作为一个三元运算进行四舍五入：它们计算值（如果）以无限精度和圆一次到的结果格式，按照当前舍入模式。

请注意，如果FMA指令不可用，则编译器的函数 fmaf（）最多只会使用更高的精确度，如果在你的编译平台上发生这种情况，那么你可能会使用类型为 double 的累加器：它比使用 FMAF（）。在最糟糕的情况下，将会提供 fmaf（）这个有缺陷的实现。

-precision

使用 Kahan summation if你的计算涉及一连串的增加。假设有很多这样的产品，只需将 r * b 计算为单精度产品，就可以获得一些准确度。如果你希望获得更高的准确性，你可能想计算 r * b 本身就是两个单精度数的总和，但是如果你这样做的话，你也可以完全切换到双单算法。双单数算术将与双简单描述的这里相同，但用单精度数字代替。

Regarding minimising the error in floating-point operations, if I have an operation such as the following in C:

float a = 123.456;
float b = 456.789;
float r = 0.12345;
a = a - (r * b);

Will the result of the calculation change if I split the multiplication and subtraction steps out, i.e.:

float c = r * b;
a = a - c;

I am wondering whether a CPU would then treat these calculations differently and thereby the error may be smaller in one case?

If not, which I presume anyway, are there any good rules-of-thumb to mitigate against floating-point error? Can I massage data in a way that will help?

Please don't just say "use higher precision" - that's not what I'm after.

EDIT

For information about the data, in the general sense errors seem to be worse when the operation results in a very large number like 123456789. Small numbers, such as 1.23456789, seem to yield more accurate results after operations. Am I imagining this, or would scaling larger numbers help accuracy?

解决方案

Note: this answer starts with a lengthy discussion of the distinction between a = a - (r * b); and float c = r * b; a = a - c; with a c99-compliant compiler. The part of the question about the goal of improving accuracy while avoiding extended precision is covered at the end.

Extended floating-point precision for intermediate results

If your C99 compiler defines FLT_EVAL_METHOD as 0, then the two computations can be expected to produce exactly the same result. If the compiler defines FLT_EVAL_METHOD to 1 or 2, then a = a - (r * b); will be more precise for some values of a, r and b, because all intermediate computations will be done at an extended precision (double for the value 1 and long double for the value 2).

The program cannot set FLT_EVAL_METHOD, but you can use commandline options to change the way your compiler computes with floating-point, and that will make it change its definition accordingly.

Contraction of some intermediate results

Depending whether you use #pragma fp_contract in your program and on your compiler's default value for this pragma, some compound floating-point expressions can be contracted into single instructions that behave as if the intermediate result was computed with infinite precision. This happens to be a possibility for your example when targeting a modern processor, as the fused-multiply-add instruction will compute a directly and as accurately as allowed by the floating-point type.

However, you should bear in mind that the contraction only take place at the compiler's option, without any guarantees. The compiler uses the FMA instruction to optimize speed, not accuracy, so the transformation may not take place at lower optimization levels. Sometimes several transformations are possible (e.g. a * b + c * d can be computed either as fmaf(c, d, a*b) or as fmaf(a, b, c*d)) and the compiler may choose one or the other.

In short, the contraction of floating-point computations is not intended to help you achieve accuracy. You might as well make sure it is disabled if you like reproducible results.

However, in the particular case of the fused-multiply-add compound operation, you can use the C99 standard function fmaf() to tell the compiler to compute the multiplication and addition in a single step with a single rounding. If you do this, then the compiler will not be allowed to produce anything else than the best result for a.


     float fmaf(float x, float y, float z);

DESCRIPTION
     The fma() functions compute (x*y)+z, rounded as one ternary operation:
     they compute the value (as if) to infinite precision and round once to
     the result format, according to the current rounding mode.

Note that if the FMA instruction is not available, your compiler's implementation of the function fmaf() will at best just use higher precision, and if this happens on your compilation platform, your might just as well use the type double for the accumulator: it will be faster and more accurate than using fmaf(). In the worst case, a flawed implementation of fmaf() will be provided.

Improving accuracy while only using single-precision

Use Kahan summation if your computation involves a long chain of additions. Some accuracy can be gained by simply summing the r*b terms computed as single-precision products, assuming there are many of them. If you wish to gain more accuracy, you might want to compute r*b itself exactly as the sum of two single-precision numbers, but if you do this you might as well switch to double-single arithmetics entirely. Double-single arithmetics would be the same as the double-double technique succinctly described here, but with single-precision numbers instead.

这篇关于用于最小化C中的浮点错误的经验法则？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

用于最小化C中的浮点错误的经验法则？ [英] Rules-of-thumb for minimising floating-point errors in C?

问题描述

中间结果的扩展浮点精度

收缩一些中间结果

-precision

Extended floating-point precision for intermediate results

Contraction of some intermediate results

Improving accuracy while only using single-precision

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

用于最小化C中的浮点错误的经验法则？ [英] Rules-of-thumb for minimising floating-point errors in C?

问题描述

中间结果的扩展浮点精度

收缩一些中间结果

-precision

Extended floating-point precision for intermediate results

Contraction of some intermediate results

Improving accuracy while only using single-precision

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭