确定是否在C/C ++中对浮点运算进行了舍入 [英] Determine if rounding occurred for a floating-point operation in C/C++

查看:245
本文介绍了确定是否在C/C ++中对浮点运算进行了舍入的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试提出一种有效的方法来确定何时/将对IEEE-754操作进行舍入.不幸的是,我不能简单地检查硬件标志.它必须在几个不同的平台上运行.

I am trying to come up with an efficient method to determine when rounding will/did occur for IEEE-754 operations. Unfortunately I am not able to simply check hardware flags. It would have to run on a few different platforms.

我想到的一种方法是在不同的舍入模式下执行运算以比较结果.

One of the approaches I thought of was to perform the operation in different rounding modes to compare the results.

添加示例:

    double result = operand1 + operand2;
    // save rounding mode
    int savedMode = fegetround();
    fesetround(FE_UPWARD);
    double upResult = operand1 + operand2;
    fesetround(FE_DOWNWARD);
    double downResult = operand1 + operand2;
    // restore rounding mode
    fesetround(savedMode);
    return (result != upResult) || (result != downResult);

但这显然效率不高,因为它必须执行3次操作.

but this is obviously inefficient because it's having to perform the operation 3 times.

推荐答案

您的示例不一定会通过优化获得正确的结果 级别-O1或更高.参见以下 Godbolt链接: 编译器仅生成一个加法vaddsd.

Your example does not necessarily give the right results with optimization levels -O1 or higher. See this Godbolt link: only one addition vaddsd is generated by the compiler.

经过优化 级别-O0的汇编看起来不错,但这将导致代码效率低下. 此外,调用fegetroundfesetround相对昂贵, 与一些浮点运算的成本相比.

With optimization level -O0 the assembly looks ok, but that would lead to inefficient code. Moreover calling fegetround and fesetround is relatively expensive, compared to the cost of a few floating point operations.

下面的(自我解释)代码可能是一个有趣的选择. 它使用著名的算法2Sum和2ProdFMA.在没有硬件fma或fma仿真的系统上,可以使用2Prod算法代替2ProdFMA, 例如,参见准确的浮点乘积和幂, 由Stef Graillat撰写.

The (self explaining) code below is probably an interesting alternative. It uses the well-known algorithms 2Sum and 2ProdFMA. On systems without hardware fma, or fma emulation, you can use the 2Prod algorithm instead of 2ProdFMA, See, for example, Accurate Floating Point Product and Exponentiation, by Stef Graillat.

/*
gcc -m64 -Wall -O3 -march=haswell round_ex.c -lm
   or with fma emulation on systems without hardware fma support, for example:
gcc -m64 -Wall -O3  -march=nehalem  round_ex.c -lm
*/

#include<math.h>
#include<float.h>
#include<stdio.h>

int add_is_not_exact(double operand1, double operand2){
    double a = operand1;
    double b = operand2;
    double s, t, a_1, b_1, d_a, d_b;
    /* Algorithm 2Sum computes s and t such that a + b = s + t, exactly.         */
    /* Here t is the error of the floating-point addition s = a + b.             */
    /* See, for example, On the robustness of the 2Sum and Fast2Sum algorithms   */
    /* by Boldo, Graillat, and Muller                                            */
    s = a + b;
    a_1 = s - b;
    b_1 = s - a_1;
    d_a = a - a_1;
    d_b = b - b_1;
    t = d_a + d_b;
    return (t!=0.0);
}


int sub_is_not_exact(double operand1, double operand2){
    return add_is_not_exact(operand1, -operand2);
}


int mul_is_not_exact(double operand1, double operand2){
    double a = operand1;
    double b = operand2;
    double s, t;
    /* Algorithm 2ProdFMA computes s and t such that a * b = s + t, exactly.     */
    /* Here t is the error of the floating-point multiplication s = a * b.       */
    /* See, for example, Accurate Floating Point Product and Exponentiation      */
    /* by Graillat                                                               */
    s = a * b;
    t = fma(a, b, -s);
    if (s!=0) return (t!=0.0);       /* No underflow of a*b                                */
    else return (a!=0.0)&&(b!=0.0);  /* Underflow: inexact if s=0, but (a!=0.0)&&(b!=0.0)  */
}


int div_is_not_exact(double operand1, double operand2){
    double a = operand1;
    double b = operand2;
    double s, t;
    s = a / b;
    t = fma(s, b, -a);  /* fma(x,y,z) computes x*y+z with infinite intermediate precision */
    return (t!=0.0);
}


int main(){

    printf("add_is_not_exact(10.0, 1.0) = %i\n", add_is_not_exact(10.0, 1.0));
    printf("sub_is_not_exact(10.0, 1.0) = %i\n", sub_is_not_exact(10.0, 1.0));
    printf("mul_is_not_exact( 2.5, 2.5) = %i\n", mul_is_not_exact( 2.5, 2.5));
    printf("div_is_not_exact(  10, 2.5) = %i\n", div_is_not_exact(  10, 2.5));
    printf("add_is_not_exact(10.0, 0.1) = %i\n", add_is_not_exact(10.0, 0.1));
    printf("sub_is_not_exact(10.0, 0.1) = %i\n", sub_is_not_exact(10.0, 0.1));
    printf("mul_is_not_exact( 2.6, 2.6) = %i\n", mul_is_not_exact( 2.6, 2.6));
    printf("div_is_not_exact(  10, 2.6) = %i\n", div_is_not_exact(  10, 2.6));

    printf("\n0x1.0p-300 = %20e, 0x1.0p-600 = %20e \n", 0x1.0p-300 , 0x1.0p-600 );
    printf("mul_is_not_exact( 0x1.0p-300, 0x1.0p-300) = %i\n", mul_is_not_exact( 0x1.0p-300, 0x1.0p-300));
    printf("mul_is_not_exact( 0x1.0p-600, 0x1.0p-600) = %i\n", mul_is_not_exact( 0x1.0p-600, 0x1.0p-600));

}

输出为:

$ ./a.out
add_is_not_exact(10.0, 1.0) = 0
sub_is_not_exact(10.0, 1.0) = 0
mul_is_not_exact( 2.5, 2.5) = 0
div_is_not_exact(  10, 2.5) = 0
add_is_not_exact(10.0, 0.1) = 1
sub_is_not_exact(10.0, 0.1) = 1
mul_is_not_exact( 2.6, 2.6) = 1
div_is_not_exact(  10, 2.6) = 1

0x1.0p-300 =         4.909093e-91, 0x1.0p-600 =        2.409920e-181 
mul_is_not_exact( 0x1.0p-300, 0x1.0p-300) = 0
mul_is_not_exact( 0x1.0p-600, 0x1.0p-600) = 1


如评论中所述,也可以直接阅读 控制和状态寄存器:

As noted in the comments, it is also possible to directly read the control and status register:

#include <fenv.h>
#pragma STDC FENV_ACCESS ON

int add_is_not_exact_v2(double a, double b)
{    
    fexcept_t excepts;
    feclearexcept(FE_ALL_EXCEPT);
    double c = a+b;
    int tst = fetestexcept(FE_INEXACT);
    return (tst!=0);
}

但是,请注意,这可能不适用于编译器优化级别-O1或更高. 在这种情况下,addsd双重加法指令有时会被完全优化掉, 导致错误的结果. 例如,使用gcc 8.2 gcc -m64 -O1 -march=nehalem:

Note, however, that this may not work with compiler optimization level -O1 or higher. In that case the addsd double add instruction is sometimes optimized away completely, leading to wrong results. For example, with gcc 8.2 gcc -m64 -O1 -march=nehalem:

add_is_not_exact_v2:
        sub     rsp, 8
        mov     edi, 61
        call    feclearexcept
        mov     edi, 32
        call    fetestexcept
        test    eax, eax
        setne   al
        movzx   eax, al
        add     rsp, 8
        ret

具有优化级别-O0,具有2个函数调用,并且相对 扩展指令来修改控制和状态寄存器,这不一定是最有效的解决方案.

With optimization level -O0, with 2 function calls, and with relatively expansive instructions to modify the control and status register, this is not necessarily the most efficient solution.

这篇关于确定是否在C/C ++中对浮点运算进行了舍入的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆