融合乘加和默认舍入模式 [英] Fused multiply add and default rounding modes

查看：30 发布时间：2021/12/18 23:22:42 c gcc clang ieee-754 fma

本文介绍了融合乘加和默认舍入模式的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

使用 GCC 5.3 以下代码与 -O3 -fma

With GCC 5.3 the following code compield with -O3 -fma

float mul_add(float a, float b, float c) {
  return a*b + c;
}

产生以下组件

vfmadd132ss     %xmm1, %xmm2, %xmm0
ret

我注意到 GCC 使用 -O3 已经在 GCC 4.8 中这样做了.

I noticed GCC doing this with -O3 already in GCC 4.8.

带有 -O3 -mfma 的 Clang 3.7 生成

Clang 3.7 with -O3 -mfma produces

vmulss  %xmm1, %xmm0, %xmm0
vaddss  %xmm2, %xmm0, %xmm0
retq

但是带有 -Ofast -mfma 的 Clang 3.7 产生与带有 -O3 fast 的 GCC 相同的代码.

but Clang 3.7 with -Ofast -mfma produces the same code as GCC with -O3 fast.

我很惊讶 GCC 可以使用 -O3 因为从这个答案它说

I am surprised that GCC does with -O3 because from this answer it says

除非您允许使用宽松的浮点模型，否则编译器不允许融合单独的加法和乘法.

The compiler is not allowed to fuse a separated add and multiply unless you allow for a relaxed floating-point model.

这是因为 FMA 只有一个舍入，而 ADD + MUL 有两个.因此编译器将通过融合违反严格的 IEEE 浮点行为.

This is because an FMA has only one rounding, while an ADD + MUL has two. So the compiler will violate strict IEEE floating-point behaviour by fusing.

但是，从这个链接它说

无论 FLT_EVAL_METHOD 的值如何，任何浮点表达式都可以被压缩，也就是说，就像所有中间结果都具有无限范围和精度一样进行计算.

Regardless of the value of FLT_EVAL_METHOD, any floating-point expression may be contracted, that is, calculated as if all intermediate results have infinite range and precision.

所以现在我很困惑和担心.

So now I am confused and concerned.

GCC 将 FMA 与 -O3 一起使用是否合理?
融合是否违反了严格的 IEEE 浮点行为?
如果融合确实违反了 IEEE 浮点行为并且因为 GCC 返回 __STDC_IEC_559__ 这不是矛盾吗?

由于 FMA 可以在软件中模拟，似乎应该有是 FMA 的两个编译器开关:一个告诉编译器在计算中使用 FMA，一个告诉编译器硬件有 FMA.

Since FMA can be emulated in software it seems to be there should be two compiler switches for FMA: one to tell the compiler to use FMA in calculations and one to tell the compiler that the hardware has FMA.

显然，这可以通过选项 -ffp-contract 来控制.GCC 的默认值为 -ffp-contract=fast 而 Clang 则不是.-ffp-contract=on 和 -ffp-contract=off 等其他选项不会产生 FMA 指令.

Apprently this can be controlled with the option -ffp-contract. With GCC the default is -ffp-contract=fast and with Clang it's not. Other options such as -ffp-contract=on and -ffp-contract=off do no produce the FMA instruction.

例如带有 -O3 -mfma -ffp-contract=fast 的 Clang 3.7 生成 vfmadd132ss.

For example Clang 3.7 with -O3 -mfma -ffp-contract=fast produces vfmadd132ss.

我使用 -ffp-contract 检查了 #pragma STDC FP_CONTRACT 设置为 ON 和 OFF 的一些排列设置为 on、off 和 fast.在所有情况下，我还使用了 -O3 -mfma.

I checked some permutations of #pragma STDC FP_CONTRACT set to ON and OFF with -ffp-contract set to on, off, and fast. IN all cases I also used -O3 -mfma.

对于 GCC，答案很简单.#pragma STDC FP_CONTRACT ON 或 OFF 没有区别.只有 -ffp-contract 重要.

With GCC the answer is simple. #pragma STDC FP_CONTRACT ON or OFF makes no difference. Only -ffp-contract matters.

GCC 它使用 fma 和

-ffp-contract=fast(默认).

在 Clang 中，它使用 fma

With Clang it uses fma

使用-ffp-contract=fast.
使用 -ffp-contract=on(默认)和 #pragma STDC FP_CONTRACT ON(默认为 OFF).

with -ffp-contract=fast.
with -ffp-contract=on (default) and #pragma STDC FP_CONTRACT ON (default is OFF).

换句话说，使用 Clang，您可以使用 #pragma STDC FP_CONTRACT ON 获得 fma(因为 -ffp-contract=on 是默认的) 或使用 -ffp-contract=fast.-ffast-math(因此-Ofast)设置-ffp-contract=fast.

In other words with Clang you can get fma with #pragma STDC FP_CONTRACT ON (since -ffp-contract=on is the default) or with -ffp-contract=fast. -ffast-math (and hence -Ofast) set -ffp-contract=fast.

我研究了 MSVC 和 ICC.

I looked into MSVC and ICC.

在 MSVC 中，它使用带有 /O2/arch:AVX2/fp:fast 的 fma 指令.使用 MSVC /fp:precise 是默认值.

With MSVC it uses the fma instruction with /O2 /arch:AVX2 /fp:fast. With MSVC /fp:precise is the default.

对于 ICC，它使用 fma 和 -O3 -march=core-avx2(实际上 -O1 就足够了).这是因为默认情况下 ICC 使用 -fp-model fast.但是 ICC 使用 fma 甚至 -fp-model precision.要使用 ICC 禁用 fma，请使用 -fp-model strict 或 -no-fma.

With ICC it uses fma with -O3 -march=core-avx2 (acctually -O1 is sufficient). This is because by default ICC uses -fp-model fast. But ICC uses fma even with -fp-model precise. To disable fma with ICC use -fp-model strict or -no-fma.

所以默认情况下 GCC 和 ICC 在启用 fma 时使用 fma(使用 -mfma 用于 GCC/Clang 或 -march=core-avx2 使用 ICC)但 Clang 和MSVC 没有.

So by default GCC and ICC use fma when fma is enabled (with -mfma for GCC/Clang or -march=core-avx2 with ICC) but Clang and MSVC do not.

融合乘加和默认舍入模式 [英] Fused multiply add and default rounding modes

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

融合乘加和默认舍入模式 [英] Fused multiply add and default rounding modes

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭