就其浮点数而言,是否存在正确的常量表达式(以浮点数表示)? [英] Is there a correct constant-expression, in terms of a float, for its msb?

查看:116
本文介绍了就其浮点数而言,是否存在正确的常量表达式(以浮点数表示)?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

问题:给定一个浮点常量表达式,我们是否可以编写一个宏来计算一个常量表达式,该常量表达式的值是2的幂次幂,等于有效位数的最高有效位?等效地,这只是幅度小于或等于输入的2的最大乘方.

The problem: given a floating point constant expression, can we write a macro that evaluates to a constant expression whose value is a power of two equal to the most significant place of the significand? Equivalently, this is just the greatest power of two less than or equal to the input in magnitude.

出于这个问题的目的,我们可以忽略:

For the purposes of this question we can ignore:

  • 接近上溢或接近下溢的值(可以使用?:的有限多次应用来处理它们).
  • 负输入(它们也可以同样处理).
  • 非附件F兼容的实现(在与它们进行浮点运算时并不能做任何有用的事情).
  • 围绕超精度的怪异(float_tdouble_t可以与FLT_EVAL_METHOD和其他float.h宏一起使用以安全地处理它).
  • Near-overflow or near-underflow values (they can be handled with finitely many applications of ?: to rescale).
  • Negative inputs (they can be handled likewise).
  • Non-Annex-F-conforming implementations (can't really do anything useful in floating point with them).
  • Weirdness around excess precision (float_t and double_t can be used with FLT_EVAL_METHOD and other float.h macros to handle it safely).

因此足以解决无穷远和反正态范围有界的正值的问题.

So it suffices to solve the problem for positive values bounded away from infinity and the denormal range.

请注意,此问题等效于找到特定值的"epsilon",即nextafter(x,INF)-x(或floatlong double中的等效值),其结果仅按DBL_EPSILON缩放(或等效的类型).发现更简单的解决方案完全可以接受.

Note that this problem is equivalent to finding the "epsilon" for a specific value, that is, nextafter(x,INF)-x (or the equivalent in float or long double), with the result just scaled by DBL_EPSILON (or equivalent for the type). Solutions that find that are perfectly acceptable if they're simpler.

我有一个拟发布的解决方案作为自我解答,但是我不确定它是否正确.

I have a proposed solution I'm posting as a self-answer, but I'm not sure if it's correct.

推荐答案

此处是用于查找ULP的代码.它受到 精确浮点求和 ,由Siegfriend M. Rump,Ogita Takeshi和大石信一(计算2 ⌈log 2 | p |⌉"):

Here is code for finding the ULP. It was inspired by algorithm 3.5 in Accurate floating-Point Summation by Siegfriend M. Rump, Takeshi Ogita, and Shin’ichi Oishi (which calculates 2⌈log2 |p|⌉):

double ULP(double q)
{
    // SmallestPositive is the smallest positive floating-point number.
    static const double SmallestPositive = DBL_EPSILON * DBL_MIN;

    /*  Scale is .75 ULP, so multiplying it by any significand in [1, 2) yields
        something in [.75 ULP, 1.5 ULP) (even with rounding).
    */
    static const double Scale = 0.75 * DBL_EPSILON;

    q = fabs(q);

    // Handle denormals, and get the lowest normal exponent as a bonus.
    if (q < 2*DBL_MIN)
        return SmallestPositive;

    /*  Subtract from q something more than .5 ULP but less than 1.5 ULP.  That
        must produce q - 1 ULP.  Then subtract that from q, and we get 1 ULP.

        The significand 1 is of particular interest.  We subtract .75 ULP from
        q, which is midway between the greatest two floating-point numbers less
        than q.  Since we round to even, the lesser one is selected, which is
        less than q by 1 ULP of q, although 2 ULP of itself.
    */
    return q - (q - q * Scale);
}

fabsif可以替换为?:.

作为参考,2 ⌈log 2 | p |⌉算法为:

For reference, the 2⌈log2 |p|⌉ algorithm is:

q = p / FLT_EPSILON
L = |(q+p) - q|
if L = 0
    L = |p|

这篇关于就其浮点数而言,是否存在正确的常量表达式(以浮点数表示)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆