添加两个浮点数 [英] Adding two floating-point numbers

查看:263
本文介绍了添加两个浮点数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想计算两个IEEE 754二进制64数字的总和,四舍五入。为此,我写了下面的C99程序:

  #include< stdio.h> 
#include< fenv.h>
#pragma STDC FENV_ACCESS ON

int main(int c,char * v []){
fesetround(FE_UPWARD);
printf(%a \\\
,0x1.0p0 + 0x1.0p-80);但是,如果我使用各种编译器编译和运行我的程序:)


< >

 
$ gcc -v
...
gcc 4.2.1(Apple Inc. build 5664)
$ gcc -Wall -std = c99 add.c && ./a.out
add.c:3:warning:忽略#pragma STDC FENV_ACCESS
0x1p + 0
$ clang -v
Apple clang version 1.5(tags / Apple / clang-60)
目标:x86_64-apple-darwin10
线程模型:posix
$ clang -Wall -std = c99 add.c &&。 /a.out
add.c:3:14:warning:pragma STDC FENV_ACCESS ON不支持,忽略
pragma [-Wunknown-pragmas]
#pragma STDC FENV_ACCESS ON
^
产生1个警告。
0x1p + 0

它不工作! (我期望结果 0x1.0000000000001p0 )。



实际上,计算是在编译时在默认round-to-nearest模式:

 
$ clang -Wall -std = c99 -S add.c && cat add.s
add.c:3:14:warning:pragma STDC FENV_ACCESS ON不支持,忽略
pragma [-Wunknown-pragmas]
#pragma STDC FENV_ACCESS ON
^
1警告。
...
LCPI1_0:
.quad 4607182418800017408
...
callq _fesetround
movb $ 1,%cl
movsd LCPI1_0(%rip) xmm0
leaq L_.str(%rip),%rdx
movq%rdx,%rdi
movb%cl,%al
callq _printf
... $ b b L_.str:
.asciz%a \\\

是的,我看到了每个编译器。我理解,在线的规模打开或关闭适用的优化可能是棘手的。我仍然希望,如果这是可能的,将它们关闭在文件的规模,这将足以解决我的问题。



我的问题是:我应该使用GCC或Clang使用哪些命令行选项,以便编译一个C99编译单元,该编译单元包含要使用默认以外的FPU舍入模式执行的代码?



小说



在研究这个问题时,我发现这个 GCC C99合规性页面,其中包含以下条目,如果有人发现有趣,我将离开这里。 Grrrr。

 
浮点| |
环境访问| N / A |库功能,无需编译器支持。
在< fenv.h> |中|


解决方案

我找不到任何你想要的命令行选项。但是,我确实找到了一种方法来重写代码,以便即使有最大的优化(甚至架构优化),GCC和Clang都不会在编译时计算该值。相反,这迫使他们输出将在运行时计算值的代码。



C:



  #include< fenv.h> 
#include< stdio.h>

#pragma STDC FENV_ACCESS ON

//加上四舍五入
double __attribute__((noinline))addrup(double x,double y){
int round = fegetround();
fesetround(FE_UPWARD);
double r = x + y;
fesetround(round); //恢复旧的舍入模式
return r;
}

int main(int c,char * v []){
printf(%a \\\
,addrup(0x1.0p0,0x1.0p- 80));
}

这会导致GCC和Clang的这些输出,即使使用最大和架构优化:



gcc -S -xc -march = corei7 -O3 Godbolt GCC ):



  addrup:
push rbx
sub rsp,16
movsd QWORD PTR [rsp + 8],xmm0
movsd QWORD PTR [rsp],xmm1
call fegetround
mov edi,2048
mov ebx,eax
call fesetround
movsd xmm1,QWORD PTR [rsp]
mov edi,ebx
movsd xmm0,QWORD PTR [rsp + 8 ]
addsd xmm0,xmm1
movsd QWORD PTR [rsp],xmm0
call fesetround
movsd xmm0,QWORD PTR [rsp]
add rsp,16
pop rbx
ret
.LC2:
.string%a\\\

main:
sub rsp,8
movsd xmm1,QWORD PTR .LC0 [rip]
movsd xmm0,QWORD PTR .LC1 [rip]
call addrup
mov edi,OFFSET FLAT:.LC2
mov eax,1
调用printf
xor eax,eax
add rsp,8
ret
.LC0:
.long 0
.long 988807168
.LC1 :
.long 0
.long 1072693248



clang -S -xc -march = corei7 -O3 Godbolt GCC ):



  addrup:#@addrup 
push rbx
sub rsp,16
movsd qword ptr [rsp],xmm1#8-byte Spill
movsd qword ptr [rsp + 8],xmm0#8-byte Spill
call fegetround
mov ebx,eax
mov edi, 2048
call fesetround
movsd xmm0,qword ptr [rsp + 8]#8-byte Reload
addsd xmm0,qword ptr [rsp]#8-byte Folded Reload
movsd qword ptr [rsp + 8],xmm0#8-byte Spill
mov edi,ebx
call fesetround
movsd xmm0,qword ptr [rsp + 8]#8-byte Reload
add rsp,16
pop rbx
ret

.LCPI1_0:
.quad 4607182418800017408#double 1
.LCPI1_1:
.quad 4246894448610377728 #double 8.2718061255302767E-25
main:#@main
push rax
movsd xmm0,qword ptr [rip + .LCPI1_0]#xmm0 = mem [0],zero
movsd xmm1,qword ptr [rip + .LCPI1_1]#xmm1 = mem [0],zero
call addrup
mov edi,.L.str
mov al,1
call printf
xor eax,eax
pop rcx
ret

.L.str:
.asciz%a \\\






现在更有趣的部分: >

好吧,当他们(GCC和/或Clang)编译代码时,他们尝试查找和替换可以在运行时计算的值。这称为常量传播。如果你只是写了另一个函数,常数传播将停止发生,因为它不应该交叉函数。



然而,如果他们看到一个函数,他们可以,理论上,用代替代替函数调用的代码,他们可以这样做。这称为函数内联。如果函数内联将用于一个函数,则我们假定该函数是(惊喜) inlinable



如果函数总是返回相同的结果对于给定的一组输入,则将其视为。我们还说,它没有副作用(意味着它不会改变环境)。



现在,如果一个函数是完全inlinable (意味着它不会调用外部库,不包括GCC和Clang中包含的一些默认值) libc libm 等),并且是纯的,那么它们将对该函数应用常量传播。



换句话说,如果我们不希望它们传播常量一个函数调用,我们可以做两件事之一:




  • 使函数显示不纯:


    • 使用文件系统

    • 从某处随机输入一些bull子魔法

    • 使用网络

    • 使用某种系统调用

    • 从外部库调用GCC和/或Clang未知的内容

    / li>
  • 使函数不完全inlinable


    • 从外部库调用GCC和/或Clang未知的内容

    • 使用 __ attribute__((noinline))




    • 现在,最后一个是最简单的。正如你可能已经推测的, __ attribute__((noinline))将函数标记为非内联。因为我们可以利用这个,所以我们要做的是做另一个函数,做任何我们想要的计算,标记它 __ attribute__((noinline)),然后调用



      编译时,它们不会违反内联和扩展常量传播规则,因此,该值将在运行时使用适当的舍入模式集。


      I would like to compute the sum, rounded up, of two IEEE 754 binary64 numbers. To that end I wrote the C99 program below:

      #include <stdio.h>
      #include <fenv.h>
      #pragma STDC FENV_ACCESS ON
      
      int main(int c, char *v[]){
        fesetround(FE_UPWARD);
        printf("%a\n", 0x1.0p0 + 0x1.0p-80);
      }
      

      However, if I compile and run my program with various compilers:

      $ gcc -v
      …
      gcc version 4.2.1 (Apple Inc. build 5664)
      $ gcc -Wall -std=c99 add.c && ./a.out 
      add.c:3: warning: ignoring #pragma STDC FENV_ACCESS
      0x1p+0
      $ clang -v
      Apple clang version 1.5 (tags/Apple/clang-60)
      Target: x86_64-apple-darwin10
      Thread model: posix
      $ clang -Wall -std=c99 add.c && ./a.out 
      add.c:3:14: warning: pragma STDC FENV_ACCESS ON is not supported, ignoring
            pragma [-Wunknown-pragmas]
      #pragma STDC FENV_ACCESS ON
                   ^
      1 warning generated.
      0x1p+0
      

      It doesn't work! (I expected the result 0x1.0000000000001p0).

      Indeed, the computation was done at compile-time in the default round-to-nearest mode:

      $ clang -Wall -std=c99 -S add.c && cat add.s
      add.c:3:14: warning: pragma STDC FENV_ACCESS ON is not supported, ignoring
            pragma [-Wunknown-pragmas]
      #pragma STDC FENV_ACCESS ON
                   ^
      1 warning generated.
      …
      LCPI1_0:
          .quad   4607182418800017408
      …
          callq   _fesetround
          movb    $1, %cl
          movsd   LCPI1_0(%rip), %xmm0
          leaq    L_.str(%rip), %rdx
          movq    %rdx, %rdi
          movb    %cl, %al
          callq   _printf
      …
      L_.str:
          .asciz   "%a\n"
      

      Yes, I did see the warning emitted by each compiler. I understand that turning the applicable optimizations on or off at the scale of the line may be tricky. I would still like, if that was at all possible, to turn them off at the scale of the file, which would be enough to resolve my question.

      My question is: what command-line option(s) should I use with GCC or Clang so as to compile a C99 compilation unit that contains code intended to be executed with an FPU rounding mode other than the default?

      Digression

      While researching this question, I found this GCC C99 compliance page, containing the entry below, that I will just leave here in case someone else finds it funny. Grrrr.

      floating-point      |     |
      environment access  | N/A | Library feature, no compiler support required.
      in <fenv.h>         |     |
      

      解决方案

      I couldn't find any command line options that would do what you wanted. However, I did find a way to rewrite your code so that even with maximum optimizations (even architectural optimizations), neither GCC nor Clang compute the value at compile time. Instead, this forces them to output code that will compute the value at runtime.

      C:

      #include <fenv.h>
      #include <stdio.h>
      
      #pragma STDC FENV_ACCESS ON
      
      // add with rounding up
      double __attribute__ ((noinline)) addrup (double x, double y) {
        int round = fegetround ();
        fesetround (FE_UPWARD);
        double r = x + y;
        fesetround (round);   // restore old rounding mode
        return r;
      }
      
      int main(int c, char *v[]){
        printf("%a\n", addrup (0x1.0p0, 0x1.0p-80));
      }
      

      This results in these outputs from GCC and Clang, even when using maximum and architectural optimizations:

      gcc -S -x c -march=corei7 -O3 (Godbolt GCC):

      addrup:
              push    rbx
              sub     rsp, 16
              movsd   QWORD PTR [rsp+8], xmm0
              movsd   QWORD PTR [rsp], xmm1
              call    fegetround
              mov     edi, 2048
              mov     ebx, eax
              call    fesetround
              movsd   xmm1, QWORD PTR [rsp]
              mov     edi, ebx
              movsd   xmm0, QWORD PTR [rsp+8]
              addsd   xmm0, xmm1
              movsd   QWORD PTR [rsp], xmm0
              call    fesetround
              movsd   xmm0, QWORD PTR [rsp]
              add     rsp, 16
              pop     rbx
              ret
      .LC2:
              .string "%a\n"
      main:
              sub     rsp, 8
              movsd   xmm1, QWORD PTR .LC0[rip]
              movsd   xmm0, QWORD PTR .LC1[rip]
              call    addrup
              mov     edi, OFFSET FLAT:.LC2
              mov     eax, 1
              call    printf
              xor     eax, eax
              add     rsp, 8
              ret
      .LC0:
              .long   0
              .long   988807168
      .LC1:
              .long   0
              .long   1072693248
      

      clang -S -x c -march=corei7 -O3 (Godbolt GCC):

      addrup:                                 # @addrup
              push    rbx
              sub     rsp, 16
              movsd   qword ptr [rsp], xmm1   # 8-byte Spill
              movsd   qword ptr [rsp + 8], xmm0 # 8-byte Spill
              call    fegetround
              mov     ebx, eax
              mov     edi, 2048
              call    fesetround
              movsd   xmm0, qword ptr [rsp + 8] # 8-byte Reload
              addsd   xmm0, qword ptr [rsp]   # 8-byte Folded Reload
              movsd   qword ptr [rsp + 8], xmm0 # 8-byte Spill
              mov     edi, ebx
              call    fesetround
              movsd   xmm0, qword ptr [rsp + 8] # 8-byte Reload
              add     rsp, 16
              pop     rbx
              ret
      
      .LCPI1_0:
              .quad   4607182418800017408     # double 1
      .LCPI1_1:
              .quad   4246894448610377728     # double 8.2718061255302767E-25
      main:                                   # @main
              push    rax
              movsd   xmm0, qword ptr [rip + .LCPI1_0] # xmm0 = mem[0],zero
              movsd   xmm1, qword ptr [rip + .LCPI1_1] # xmm1 = mem[0],zero
              call    addrup
              mov     edi, .L.str
              mov     al, 1
              call    printf
              xor     eax, eax
              pop     rcx
              ret
      
      .L.str:
              .asciz  "%a\n"
      


      Now for the more interesting part: why does that work?

      Well, when they (GCC and/or Clang) compile code, they try to find and replace values that can be computed at runtime. This is known as constant propagation. If you had simply written another function, constant propagation would cease to occur, since it isn't supposed to cross functions.

      However, if they see a function that they could, in theory, substitute the code of in place of the function call, they may do so. This is known as function inlining. If function inlining will work on a function, we say that that function is (surprise) inlinable.

      If a function always return the same results for a given set of inputs, then it is considered pure. We also say that it has no side effects (meaning it makes no changes to the environment).

      Now, if a function is fully inlinable (meaning that it doesn't make any calls to external libraries excluding a few defaults included in GCC and Clang - libc, libm, etc.) and is pure, then they will apply constant propagation to the function.

      In other words, if we don't want them to propagate constants through a function call, we can do one of two things:

      • Make the function appear impure:
        • Use the filesystem
        • Do some bullshit magic with some random input from somewhere
        • Use the network
        • Use some syscall of some sort
        • Call something from an external library unknown to GCC and/or Clang
      • Make the function not fully inlinable
        • Call something from an external library unknown to GCC and/or Clang
        • Use __attribute__ ((noinline))

      Now, that last one is the easiest. As you may have surmised, __attribute__ ((noinline)) marks the function as non-inlinable. Since we can take advantage of this, all we have to do is make another function that does whatever computation we want, mark it with __attribute__ ((noinline)), and then call it.

      When it is compiled, they will not violate the inlining and, by extension, constant propagation rules, and therefore, the value will be computed at runtime with the appropriate rounding mode set.

      这篇关于添加两个浮点数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆