_mm_max_ss在clang和gcc之间具有不同的行为 [英] _mm_max_ss has different behavior between clang and gcc

查看:67
本文介绍了_mm_max_ss在clang和gcc之间具有不同的行为的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用clang和gcc交叉编译项目,但是在使用 _mm_max_ss 例如

I'm trying to cross compile a project using clang and gcc but I'm seeing some odd differences when using _mm_max_ss e.g.

__m128 a = _mm_set_ss(std::numeric_limits<float>::quiet_NaN());
__m128 b = _mm_set_ss(2.0f);
__m128 c = _mm_max_ss(a,b);
__m128 d = _mm_max_ss(b,a);

现在,当涉及到NaN但clang和gcc给出不同的结果时,我期望的是 std :: max 类型的行为:

Now I expected std::max type behavior when NaNs are involved but clang and gcc give different results:

Clang: (what I expected)
c: 2.000000 0.000000 0.000000 0.000000 
d: nan 0.000000 0.000000 0.000000 

Gcc: (Seems to ignore order)
c: nan 0.000000 0.000000 0.000000 
d: nan 0.000000 0.000000 0.000000 

_mm_max_ps在我使用它时会做预期的事情.我尝试使用 -ffast-math -fno-fast-math ,但似乎没有效果.有什么想法可以使编译器之间的行为相似?

_mm_max_ps does the expected thing when I use it. I've tried using -ffast-math, -fno-fast-math but it doesn't seem to have an effect. Any ideas to make the behavior similar across compilers?

Godbolt链接此处

Godbolt link here

推荐答案

我的理解是IEEE-754要求:(NaN cmp x)所有 cmp 运算符 {==,< ;,< =,> ;,> =} ,除了 {!=}返回 true . max() 函数的实现可以根据任何不等式运算符来定义.

My understanding is that IEEE-754 requires: (NaN cmp x) to return false for all cmp operators {==, <, <=, >, >=}, except {!=} which returns true. An implementation of a max() function might be defined in terms of any of the inequality operators.

因此,问题是,如何实现 _mm_max_ps ?使用 {<,< =,>,> =} 还是进行一点比较?

So, the question is, how is _mm_max_ps implemented? With {<, <=, >, >=}, or a bit comparison?

有趣的是,当您在链接中禁用优化时,gcc和clang都使用了相应的 maxss 指令.两者都产生:

Interestingly, when disabling optimization in your link, the corresponding maxss instruction is used by both gcc and clang. Both yield:

2.000000 0.000000 0.000000 0.000000 
nan 0.000000 0.000000 0.000000

鉴于以下情况,这表明: max(NaN,2.0f)->2.0f ,即: max(a,b)=(a op b)吗?a:b ,其中 op 是以下之一: {< ;、< =,> ;、> =} .使用IEEE-754规则,此比较的结果始终为false,因此:

This suggests, given: max(NaN, 2.0f) -> 2.0f, that: max(a, b) = (a op b) ? a : b, where op is one of: {<, <=, >, >=}. With IEEE-754 rules, the result of this comparison is always false, so:

(NaN op val)总是 false,返回(val)
(val op NaN)总是 false,返回(NaN)

(NaN op val) is always false, returning (val),
(val op NaN) is always false, returning (NaN)

启用优化后,编译器可以在编译时自由地预先计算(c)(d).似乎clang按照 maxss 指令的方式评估结果-纠正按需"行为.GCC要么放弃使用 max() 的另一种实现方式-它使用GMP和MPFR库作为编译时数值-或者只是对粗心_mm_max_ss 语义.

With optimization on, the compiler is free to precompute (c) and (d) at compile time. It appears that clang evaluates the results as the maxss instruction would - correct 'as-if' behaviour. GCC is either falling back on another implementation of max() - it uses the GMP and MPFR libraries for compile-time numerics - or is just being careless with the _mm_max_ss semantics.

GCC在Godbolt上的10.2和主干版本仍然存在问题.因此,我认为您已经找到了一个错误!我没有回答第二部分,因为我想不出能有效解决此问题的通用黑客工具.

GCC is still getting it wrong with 10.2 and trunk versions on godbolt. So I think you've found a bug! I haven't answered the second part, because I can't think of an all-purpose hack that will efficiently work around this.

根据Intel的ISA参考:

From Intel's ISA reference:

如果要比较的值均为0.0s(任一符号),则该值返回第二个源操作数.如果第二个值源操作数是一个SNaN,即SNaN不变地返回给SNaN.目的地(即未返回SNaN的QNaN版本).

If the values being compared are both 0.0s (of either sign), the value in the second source operand is returned. If a value in the second source operand is an SNaN, that SNaN is returned unchanged to the destination (that is, a QNaN version of the SNaN is not returned).

如果此指令的NaN(SNaN或QNaN)只有一个值,则第二个源操作数,即NaN或有效的浮点值,写入结果.如果不是这种行为,则是必需的从任一源操作数返回的NaN,可以使用一系列指令来模拟MAXSS,例如比较,然后是AND,ANDN和OR.

If only one value is a NaN (SNaN or QNaN) for this instruction, the second source operand, either a NaN or a valid floating-point value, is written to the result. If instead of this behavior, it is required that the NaN from either source operand be returned, the action of MAXSS can be emulated using a sequence of instructions, such as, a comparison followed by AND, ANDN and OR.

这篇关于_mm_max_ss在clang和gcc之间具有不同的行为的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆