融合的乘加和默认舍入模式 [英] Fused multiply add and default rounding modes
问题描述
使用GCC 5.3以下code compield与 -O3 -fma
With GCC 5.3 the following code compield with -O3 -fma
float mul_add(float a, float b, float c) {
return a*b + c;
}
产生以下组件
vfmadd132ss %xmm1, %xmm2, %xmm0
ret
I noticed GCC doing this with -O3
already in GCC 4.8.
锵3.7与 -O3 -mfma
产生
vmulss %xmm1, %xmm0, %xmm0
vaddss %xmm2, %xmm0, %xmm0
retq
但锵3.7与 -Ofast -mfma
产生相同的code作为GCC与 -O3快
but Clang 3.7 with -Ofast -mfma
produces the same code as GCC with -O3 fast
.
我很惊讶的是,海湾合作委员会确实与 -O3
因为这个答案它说:
I am surprised that GCC does with -O3
because from this answer it says
编译器不允许一个融合分离加和乘法,除非你允许一个轻松的浮点模型。
The compiler is not allowed to fuse a separated add and multiply unless you allow for a relaxed floating-point model.
这是因为FMA只有一个舍入,而一个ADD + MUL有两个。因此,编译器将通过融合违反严格IEEE浮点行为。
This is because an FMA has only one rounding, while an ADD + MUL has two. So the compiler will violate strict IEEE floating-point behaviour by fusing.
不过,从此链接它说
不管FLT_EVAL_METHOD的值,任何浮点前pression可以收缩,即,计算为如果所有中间结果有无限的范围和precision
Regardless of the value of FLT_EVAL_METHOD, any floating-point expression may be contracted, that is, calculated as if all intermediate results have infinite range and precision.
所以现在我很困惑和担心。
So now I am confused and concerned.
- 是gcc使用FMA合理与
-O3
? - 是否违反融合严格IEEE浮点行为?
- 如果熔断确实违反IEEE浮点beahviour,自<一个href=\"http://stackoverflow.com/questions/31181897/status-of-stdc-iec-559-with-modern-c-compilers\">GCC返回
__ __ STDC_IEC_559
是不是这个矛盾?
由于FMA 可以通过软件这似乎是模仿应该有有两种编译器开关FMA:一是告诉编译器在计算中使用FMA和一个告诉的硬件有FMA编译器
Since FMA can be emulated in software it seems to be there should be two compiler switches for FMA: one to tell the compiler to use FMA in calculations and one to tell the compiler that the hardware has FMA.
鸭prently这可以用选项 -ffp合约
控制。随着GCC默认为 -ffp合同=快速
并与锵它不是。其他选项,如 -ffp合同= ON
和 -ffp合同=关闭
做没有产生FMA指令。
Apprently this can be controlled with the option -ffp-contract
. With GCC the default is -ffp-contract=fast
and with Clang it's not. Other options such as -ffp-contract=on
and -ffp-contract=off
do no produce the FMA instruction.
例如锵3.7与 -O3 -mfma -ffp合同=快速
产生 vfmadd132ss
。
我检查的一些排列的#pragma STDC FP_CONTRACT
设置为 ON
和关闭
与 -ffp合约
设置为在
,关闭
和快速
。在我还用 -O3 -mfma
。
I checked some permutations of #pragma STDC FP_CONTRACT
set to ON
and OFF
with -ffp-contract
set to on
, off
, and fast
. IN all cases I also used -O3 -mfma
.
使用GCC的答案很简单。 的#pragma STDC FP_CONTRACT
开启或关闭没有差别。只有 -ffp合约
事项。
With GCC the answer is simple. #pragma STDC FP_CONTRACT
ON or OFF makes no difference. Only -ffp-contract
matters.
它使用GCC FMA
与
-
-ffp合同=快速
(默认值)。
-ffp-contract=fast
(default).
铿锵它使用 FMA
- 与
-ffp合同=快速
。 - 与
-ffp合同= ON
(默认)和的#pragma STDC FP_CONTRACT ON
(默认为关闭
)。
- with
-ffp-contract=fast
. - with
-ffp-contract=on
(default) and#pragma STDC FP_CONTRACT ON
(default isOFF
).
在铿锵换句话说,你可以得到 FMA
与的#pragma STDC FP_CONTRACT ON
(因为 -ffp合同= ON
是默认)或 -ffp合同=快速
。 -ffast-数学
(因此 -Ofast
)设置 -ffp合同=快
。
In other words with Clang you can get fma
with #pragma STDC FP_CONTRACT ON
(since -ffp-contract=on
is the default) or with -ffp-contract=fast
. -ffast-math
(and hence -Ofast
) set -ffp-contract=fast
.
我看着MSVC和ICC。
I looked into MSVC and ICC.
随着MSVC它采用 / O2 /拱FMA指令:AVX2 / FP:快
。随着MSVC / FP:precise
是默认
With MSVC it uses the fma instruction with /O2 /arch:AVX2 /fp:fast
. With MSVC /fp:precise
is the default.
使用ICC它使用FMA与 -O3 -march =核心AVX2
(acctually -O1
就足够了) 。这是因为在默认情况下使用ICC -fp模型快速
。但ICC使用FMA甚至与 -fp模型$ P $ pcise
。要与ICC使用禁用FMA -fp模型严格
或 -no-FMA
。
With ICC it uses fma with -O3 -march=core-avx2
(acctually -O1
is sufficient). This is because by default ICC uses -fp-model fast
. But ICC uses fma even with -fp-model precise
. To disable fma with ICC use -fp-model strict
or -no-fma
.
所以,在默认情况下GCC和ICC使用FMA FMA时启用(与 -mfma
为GCC /锵或 -march =核心AVX2
与ICC),但锵和MSVC没有。
So by default GCC and ICC use fma when fma is enabled (with -mfma
for GCC/Clang or -march=core-avx2
with ICC) but Clang and MSVC do not.
推荐答案
这并不违反IEEE-754,因为IEEE-754委托给语言这一点:
It doesn't violate IEEE-754, because IEEE-754 defers to languages on this point:
一个语言的标准也应该定义,并要求实现来提供,属性,这些属性允许和禁止的价值变化的优化,单独或集体地为块。这些优化可能包括,但不限于:
A language standard should also define, and require implementations to provide, attributes that allow and disallow value-changing optimizations, separately or collectively, for a block. These optimizations might include, but are not limited to:
...
- 从乘法和加法一个fusedMultiplyAdd操作的合成
― Synthesis of a fusedMultiplyAdd operation from a multiplication and an addition.
在标准C中, STDC FP_CONTRACT
编译提供了控制该值改变优化的手段。因此GCC是授权给执行默认的融合,只要它允许您通过设置 STDC FP_CONTRACT关闭
来禁用优化。不支持,这意味着没有遵守C标准。
In standard C, the STDC FP_CONTRACT
pragma provides the means to control this value-changing optimization. So GCC is licensed to perform the fusion by default, so long as it allows you to disable the optimization by setting STDC FP_CONTRACT OFF
. Not supporting that means not adhering to the C standard.
这篇关于融合的乘加和默认舍入模式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!