与GCC关联的数学 [英] associative-math with GCC

查看:60
本文介绍了与GCC关联的数学的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我创建了 double-double 数据类型在C中.我在GCC上尝试了-Ofast,发现它的运行速度更快(例如,使用-O3时为1.5 s,使用-Ofast时为0.3s),但是结果是虚假的.我将其追逐到-fassociative-math.令我惊讶的是,这并不起作用,因为当它很重要时,我会明确定义操作的关联性.例如,在以下代码中,我在重要的地方加上括号.

I have created a double-double data type in C. I tried -Ofast with GCC and discovered that it's dramatically faster (e.g. 1.5 s with -O3 and 0.3s with -Ofast) but the results are bogus. I chased this down to -fassociative-math. I'm surprised this does not work because I explicitly define the associativity of my operations when it matters. For example in the following code I but parentheses where it matters.

static inline doublefloat two_sum(const float a, const float b) {
        float s = a + b;
        float v = s - a;
        float e = (a - (s - v)) + (b - v);
        return (doublefloat){s, e};
}

因此,我不希望GCC发生变化,例如(a - (s - v))((a + v) - s),即使使用-fassociative-math也是如此.那么,为什么使用-fassociative-math的结果如此错误(并且速度更快)?

So I don't expect GCC to change e.g. (a - (s - v)) to ((a + v) - s) even with -fassociative-math. So why are the results so wrong using -fassociative-math (and so much faster)?

我用MSVC尝试了/fp:fast(将代码转换为C ++之后),结果是正确的,但并不比/fp:precise快.

I tried /fp:fast with MSVC (after converting my code to C++) and the results are correct but it's no faster than /fp:precise.

关于-fassociative-math的内容,来自GCC手册

From the GCC manual in regards to -fassociative-math it states

允许一系列浮点运算中的操作数重新关联.通过可能更改计算结果,这违反了ISO C和C ++语言标准.注意:重新排序也可能会更改零的符号 因为忽略NaN并抑制或产生下溢或上溢(因此不能用于依赖舍入行为的代码,例如(x + 2 ^ 52)-2 ^ 52".也可能对浮点比较进行重新排序,因此可能不会 需要顺序比较时使用.此选项要求-fno-signed-zeros和-fno-trapping-math都有效.此外,使用-frounding-math并没有多大意义.

Allow re-association of operands in series of floating-point operations. This violates the ISO C and C++ language standard by possibly changing computation result. NOTE: re-ordering may change the sign of zero as well as ignore NaNs and inhibit or create underflow or overflow (and thus cannot be used on code that relies on rounding behavior like "(x + 2^52) - 2^52". May also reorder floating-point comparisons and thus may not be used when ordered comparisons are required. This option requires that both -fno-signed-zeros and -fno-trapping-math be in effect. Moreover, it doesn't make much sense with -frounding-math.

我用整数(有符号和无符号)进行了一些测试,然后进行浮点运算来检查GCC是否简化了关联操作.这是我测试过的代码

I did some tests with integers (signed and unsigned) and float to check to see if GCC simplifies associative operations. Here is the code I tested

//test1.c
unsigned foosu(unsigned a, unsigned b, unsigned c) { return (a + c) - b; }
signed   fooss(signed   a, signed   b, signed   c) { return (a + c) - b; }
float    foosf(float    a, float    b, float    c) { return (a + c) - b; }
unsigned foomu(unsigned a, unsigned b, unsigned c) { return a*a*a*a*a*a; }
signed   fooms(signed   a, signed   b, signed   c) { return a*a*a*a*a*a; }
float    foomf(float    a, float    b, float    c) { return a*a*a*a*a*a; }

//test2.c
unsigned foosu(unsigned a, unsigned b, unsigned c) { return a - (b - c);     }
signed   fooss(signed   a, signed   b, signed   c) { return a - (b - c);     }
float    foosf(float    a, float    b, float    c) { return a - (b - c);     }
unsigned foomu(unsigned a, unsigned b, unsigned c) { return (a*a*a)*(a*a*a); }
signed   fooms(signed   a, signed   b, signed   c) { return (a*a*a)*(a*a*a); }
float    foomf(float    a, float    b, float    c) { return (a*a*a)*(a*a*a); }

我遵守了-O3-Ofast的规定,并查看了生成的程序集,这就是我观察到的情况

I complied with -O3 and -Ofast and I looked at the generated assembly and this is what I observed

  • 无符号:加法和乘法(减少到三个乘法)相同.
  • 签名的:代码的加法不是完全相同的,而是用于乘法的(减少为三个乘法)
  • float:与-O3进行加法或乘法运算的代码不同,但是与-Ofast进行加法运算的结果相同,并且仅使用三个乘法运算的乘积几乎相同.
  • unsigned: the code was identical both for addition and multiplication (reduced to three multiplications)
  • signed: the code was not identical for addition but was for multiplication (reduced to three multiplications)
  • float: the code was not identical for addition or multiplication with -O3 however with -Ofast the addition was identical and the multiplication was almost the same using only three multiplications.

据此我得出结论

  • 如果一个操作是关联的,则GCC会简化它,但是它会选择将a - (b - c)变为(a + c) - b.
  • 无符号加法和乘法是关联的
  • 已签名加法不具有关联性
  • 有符号乘法是关联的
  • a*a*a*a*a*a 简化为整数的三个乘法以及使用-fassociative-math时的浮点数.
  • -fassociative-math导致浮点加法和乘法具有关联性.
  • if an operation is associative then GCC will simplify it however it chooses so that a - (b - c) can become (a + c) - b.
  • unsigned addition and multiplication is associative
  • signed addition is not associative
  • signed multiplication is associative
  • a*a*a*a*a*a gets simplified to only three multiplications for integers and for floating point when using -fassociative-math.
  • -fassociative-math causes floating point addition and multiplication to be associative.

换句话说,GCC完全按照我的预期执行了-fassociative-math.它将(a - (s - v))转换为((a + v) - s).

In other words GCC did exactly what I did not expect it to do with -fassociative-math. It converted (a - (s - v)) to ((a + v) - s).

人们可能认为这对于-fassociative-math是显而易见的,但是在某些情况下,程序员可能希望浮点在一次情况下是关联的,而在另一种情况下是不关联的. 例如,自动向量化和减少浮点数组需要-fassociative-math ,但是如果这样做,则不能在同一模块中使用double-float.因此,唯一的选择是将关联浮点函数放在一个模块中,将非关联浮点函数放在另一个模块中,然后将它们编译成单独的目标文件.

One may think this is obvious with -fassociative-math but there are cases where a programmer may want to have the floating point be associative in once case and non-associative in another case. For example auto-vectorization and reducing a floating point array requires -fassociative-math but if this is done the double-float can't be used in the same module. So the only option is to put associative floating point functions in one module and non-associative floating point functions in another module and compile them into seperate object files.

推荐答案

令我惊讶的是,这没有用,因为我在重要的时候明确定义了操作的关联性.例如,在以下代码中,我在重要的地方加上括号.

I'm surprised this does not work because I explicitly define the associativity of my operations when it matters. For example in the following code I but parentheses where it matters.

这正是-fassociative-math的作用:它忽略了程序定义的顺序(即不带括号的定义),而是进行了简化.通常,对于双双加法,误差项被计算为0,因为如果浮点运算是关联的,则误差项等于0. e = 0;e = (a - …;快得多,但是当然,这是错误的.

This is exactly what -fassociative-math does: it ignores the ordering defined by your program (which is just as defined without the parentheses) and does what allows simplifications instead. Typically, for double-double addition, the error term is computed as 0, because that's what it would be equal to if floating-point operations were associative. e = 0; is much faster than e = (a - …;, but of course, it is just wrong.

在C99标准中,以下6.5.6:1中的语法规则暗示x + y + z只能被解析为(x + y) + z:

In the C99 standard, the following grammar rule in 6.5.6:1 imply that x + y + z can only be parsed as (x + y) + z:


additive-expression:
         multiplicative-expression
         additive-expression + multiplicative-expression
         additive-expression - multiplicative-expression

明确的括号和对中间左值的赋值不会阻止-fassociative-math的工作.即使没有它们也定义了顺序(在一系列加法和减法的情况下从左到右),并且您告诉编译器忽略定义的顺序.实际上,对于优化应用到的中间表示形式,我怀疑信息是否仍然是顺序是由中间赋值,括号还是语法强加的.

Explicit parentheses and assignments to intermediate lvalues do not prevent -fassociative-math from doing its stuff. The order was defined even without them (left-to-right in case of a sequence of additions and subtractions), and you told the compiler to ignore the defined order. In fact, on the intermediate representation the optimization is applied to, I doubt the information remains whether the order was imposed by intermediate assignments, parentheses or the grammar.

您可以尝试将希望按照C标准强加的顺序编译的所有功能放在不使用-fassociative-math进行编译的同一编译单元中,或者在整个程序中完全避免使用此标志.如果您坚持在使用-fassociative-math编译的编译单元中保留double-double加法,则可以尝试使用volatile变量,但是volatile类型限定符仅使对左值的访问成为可观察的事件,而不会强制进行正确的计算.

You could try putting all the functions that you wish to compile with the ordering imposed by the C standard in a same compilation unit that you would compile without -fassociative-math, or avoid this flag altogether for the entire program. If you insist on leaving double-double addition in a compilation unit compiled with -fassociative-math, you could try playing with volatile variables, but the volatile type qualifier only makes access to the lvalue an observable event, it doesn't force the right computation to take place.

这篇关于与GCC关联的数学的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆