融合的乘加和默认舍入模式 [英] Fused multiply add and default rounding modes

查看:294
本文介绍了融合的乘加和默认舍入模式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用GCC 5.3以下code compield与 -O3 -fma

With GCC 5.3 the following code compield with -O3 -fma

float mul_add(float a, float b, float c) {
  return a*b + c;
}

产生以下组件

vfmadd132ss     %xmm1, %xmm2, %xmm0
ret

我注意到GCC与 -O3 操作已在GCC 4.8

I noticed GCC doing this with -O3 already in GCC 4.8.

锵3.7与 -O3 -mfma 产生

vmulss  %xmm1, %xmm0, %xmm0
vaddss  %xmm2, %xmm0, %xmm0
retq

但锵3.7与 -Ofast -mfma 产生相同的code作为GCC与 -O3快

but Clang 3.7 with -Ofast -mfma produces the same code as GCC with -O3 fast.

我很惊讶的是,海湾合作委员会确实与 -O3 因为这个答案它说:

I am surprised that GCC does with -O3 because from this answer it says

编译器不允许一个融合分离加和乘法,除非你允许一个轻松的浮点模型。

The compiler is not allowed to fuse a separated add and multiply unless you allow for a relaxed floating-point model.

这是因为FMA只有一个舍入,而一个ADD + MUL有两个。因此,编译器将通过融合违反严格IEEE浮点行为。

This is because an FMA has only one rounding, while an ADD + MUL has two. So the compiler will violate strict IEEE floating-point behaviour by fusing.

不过,从此链接它说

不管FLT_EVAL_METHOD的值,任何浮点前pression可以收缩,即,计算为如果所有中间结果有无限的范围和precision

Regardless of the value of FLT_EVAL_METHOD, any floating-point expression may be contracted, that is, calculated as if all intermediate results have infinite range and precision.

所以现在我很困惑和担心。

So now I am confused and concerned.


  1. 是gcc使用FMA合理与 -O3

  2. 是否违反融合严格IEEE浮点行为?

  3. 如果熔断确实违反IEEE浮点beahviour,自<一个href=\"http://stackoverflow.com/questions/31181897/status-of-stdc-iec-559-with-modern-c-compilers\">GCC返回 __ __ STDC_IEC_559 是不是这个矛盾?

由于FMA 可以通过软件这似乎是模仿应该有有两种编译器开关FMA:一是告诉编译器在计算中使用FMA和一个告诉的硬件有FMA编译器

Since FMA can be emulated in software it seems to be there should be two compiler switches for FMA: one to tell the compiler to use FMA in calculations and one to tell the compiler that the hardware has FMA.

鸭prently这可以用选项 -ffp合约控制。随着GCC默认为 -ffp合同=快速并与锵它不是。其他选项,如 -ffp合同= ON -ffp合同=关闭做没有产生FMA指令。

Apprently this can be controlled with the option -ffp-contract. With GCC the default is -ffp-contract=fast and with Clang it's not. Other options such as -ffp-contract=on and -ffp-contract=off do no produce the FMA instruction.

例如锵3.7与 -O3 -mfma -ffp合同=快速产生 vfmadd132ss

我检查的一些排列的#pragma STDC FP_CONTRACT 设置为 ON 关闭 -ffp合约设置为关闭快速。在我还用 -O3 -mfma

I checked some permutations of #pragma STDC FP_CONTRACT set to ON and OFF with -ffp-contract set to on, off, and fast. IN all cases I also used -O3 -mfma.

使用GCC的答案很简单。 的#pragma STDC FP_CONTRACT 开启或关闭没有差别。只有 -ffp合约事项。

With GCC the answer is simple. #pragma STDC FP_CONTRACT ON or OFF makes no difference. Only -ffp-contract matters.

它使用GCC FMA


  1. -ffp合同=快速(默认值)。

  1. -ffp-contract=fast (default).

铿锵它使用 FMA


  1. -ffp合同=快速

  2. -ffp合同= ON (默认)和的#pragma STDC FP_CONTRACT ON (默认为关闭)。

  1. with -ffp-contract=fast.
  2. with -ffp-contract=on (default) and #pragma STDC FP_CONTRACT ON (default is OFF).

在铿锵换句话说,你可以得到 FMA 的#pragma STDC FP_CONTRACT ON (因为 -ffp合同= ON 是默认)或 -ffp合同=快速 -ffast-数学(因此 -Ofast )设置 -ffp合同=快

In other words with Clang you can get fma with #pragma STDC FP_CONTRACT ON (since -ffp-contract=on is the default) or with -ffp-contract=fast. -ffast-math (and hence -Ofast) set -ffp-contract=fast.

我看着MSVC和ICC。

I looked into MSVC and ICC.

随着MSVC它采用 / O2 /拱FMA指令:AVX2 / FP:快。随着MSVC / FP:precise 是默认

With MSVC it uses the fma instruction with /O2 /arch:AVX2 /fp:fast. With MSVC /fp:precise is the default.

使用ICC它使用FMA与 -O3 -march =核心AVX2 (acctually -O1 就足够了) 。这是因为在默认情况下使用ICC -fp模型快速。但ICC使用FMA甚至与 -fp模型$ P ​​$ pcise 。要与ICC使用禁用FMA -fp模型严格 -no-FMA

With ICC it uses fma with -O3 -march=core-avx2 (acctually -O1 is sufficient). This is because by default ICC uses -fp-model fast. But ICC uses fma even with -fp-model precise. To disable fma with ICC use -fp-model strict or -no-fma.

所以,在默认情况下GCC和ICC使用FMA FMA时启用(与 -mfma 为GCC /锵或 -march =核心AVX2 与ICC),但锵和MSVC没有。

So by default GCC and ICC use fma when fma is enabled (with -mfma for GCC/Clang or -march=core-avx2 with ICC) but Clang and MSVC do not.

推荐答案

这并不违反IEEE-754,因为IEEE-754委托给语言这一点:

It doesn't violate IEEE-754, because IEEE-754 defers to languages on this point:

一个语言的标准也应该定义,并要求实现来提供,属性,这些属性允许和禁止的价值变化的优化,单独或集体地为块。这些优化可能包括,但不限于:

A language standard should also define, and require implementations to provide, attributes that allow and disallow value-changing optimizations, separately or collectively, for a block. These optimizations might include, but are not limited to:

...

- 从乘法和加法一个fusedMultiplyAdd操作的合成

― Synthesis of a fusedMultiplyAdd operation from a multiplication and an addition.

在标准C中, STDC FP_CONTRACT 编译提供了控制该值改变优化的手段。因此GCC是授权给执行默认的融合,只要它允许您通过设置 STDC FP_CONTRACT关闭来禁用优化。不支持,这意味着没有遵守C标准。

In standard C, the STDC FP_CONTRACT pragma provides the means to control this value-changing optimization. So GCC is licensed to perform the fusion by default, so long as it allows you to disable the optimization by setting STDC FP_CONTRACT OFF. Not supporting that means not adhering to the C standard.

这篇关于融合的乘加和默认舍入模式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆