符合IEEE 754的double类型的sqrt()实现 [英] IEEE 754 conformant sqrt() implementation for double type

查看:94
本文介绍了符合IEEE 754的double类型的sqrt()实现的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试实现 double __ieee754_sqrt(double x)函数,该函数使用硬件指令来获得第一近似值:

 <代码>双精度__ieee754_sqrt(双精度x){双z;/*获得平方根的倒数(精度为6.75位)*/__asm("QSEED.DF%0,%1 \ n":"= e"(z):"e"(x):);z = 1/z;z =(z + x/z)/2;/*第1次Newton-Raphson迭代*/z =(z + x/z)/2;/*第2次Newton-Raphson迭代*/z =(z + x/z)/2;/*第3次Newton-Raphson迭代*/z =(z + x/z)/2;/*第4次Newton-Raphson迭代*/返回z;} 

但是,paranoia.c(链接链接)测试抱怨:

 平方根没有被切碎,也没有正确地四舍五入.观察到的错误从-6.0493828e-01到5.0000000e-01 ulps. 

问题:如何实现用于斩波和正确舍入的其他逻辑?

UPD.硬件本身不支持 sqrt().硬件仅支持求平方根的倒数(精度为6.75位).

UPD2.

  1. 使用njuffa的解决方案(非常感谢!)进行一些小的更改:使用 qseeddf()代替 qseedf() =>使用 fma()代替 fmaf().为什么?因为它省略了 double< =&f; float 转换,因此速度更快.
  2. 是的,硬件支持融合乘法加法指令(FMA).
  3. 感谢大家参与讨论并提供详细答案!
  4. 对于所有对此主题感兴趣的人,这里是 sqrt()实现的列表:

    1. 来自Cygwin数学.库( libm ): cygwin-snapshot-20200710-1/newlib/libm/math/e_sqrt.c :版权所有 Sun Microsystems版权所有(C)1993/code>.
    2. 从GNU C库( glibc ):

      1. glibc-2.31/sysdeps/ieee754/dbl-64/e_sqrt.c :标题为 IBM精确数学库.
      2. glibc-2.31/sysdeps/powerpc/fpu/e_sqrt.c :使用 __ builtin_fma()函数.

解决方案

在着手构建自己的实现之前,建议先在互联网上进行搜索,以检查是否有合适的和经过测试的开源代码./p>

通用迭代算法对互逆的平方根使用无除法迭代以达到所需的精度,然后将自变量与参数相乘以计算平方根,最后使用所需的舍入模式进行舍入.倒数平方根的迭代可以使用具有二次收敛性的牛顿-拉夫森迭代(将正确位数大约加倍),也可以使用具有三次收敛性的哈雷迭代(将正确位数增加大约三倍).尽管存在高阶迭代,但通常不使用它们.

为使代码简单,建议在二进制浮点运算的情况下,将参数减小为包含两个连续二进制数的单个窄间隔.请注意,由于需要进行指数操作,因此通常不会实现最高性能.出于性能原因,双精度实现的初始迭代通常以单精度执行.

在下面的示例性ISO-C99实现中,我将展示如何沿着这些直线实现正确取整的双精度平方根.我假设 double 映射到IEEE-754 binary64 ,并且 float 映射到IEEE-754 binary32 .我仅限于使用IEEE-754舍入至最近或偶数模式实现的 sqrt .

非常重要我假设处理器硬件提供了融合的乘法加法指令,并且已通过标准数学库函数 fmaf fma正确公开了这些指令.在评论中,我曾要求OP澄清FMA的可用性,但决定在获得反馈之前就着手编写代码.没有FMA的实现是可能的,但是更具挑战性,足够完整的处理可能会超出Stackoverflow答案的范围.

由于OP未指定目标体系结构或未提供起始近似值的详细信息,因此我在下面使用基于区间[0.25,1]的多项式最小极大近似的我自己的起始近似,所有非异常参数均已减小到该区间. qseedf()结果精确到大约7位,因此比OP的内置硬件稍好.这种差异是否显着,我无法评估.

该算法(特别是舍入逻辑)依赖于Peter Markstein的思想,因此,我有理由相信该算法在构造上是正确的.我在这里仅执行了非常基本的测试.最佳行业实践是通过数学方法证明这种算法的正确性,例如,请参阅David Russinoff和John Harrison的出版物.紧要关头,一个人可能可以通过对两个连续的Binade进行详尽的测试(如今这是可行的,一个小型集群运行了几天),再加上对所有Binade进行基于随机和基于模式的测试.

  #include< stdio.h>#include< stdlib.h>#include< stdint.h>#include< string.h>#include< math.h>/*在[0.25,1]上大约为1/sqrt(a),精度约为7位*/浮点qseedf(浮点a){浮动r;r = -2.43845296f;r = fmaf(r,a,6.22994471f);r = fmaf(r,a,-5.91090727f);r = fmaf(r,a,3.11237526f);返回r;}双重my_sqrt(双重a){const double QNAN_INDEFINITE = 0.0/0.0;const double half = 0.5;const double three_eighth = 0.375;doublefine_rsqrt_approx,sqrt_approx,sqrt_residual,结果,b;双重rsqrt_approx,rsqrt_approx_err,rsqrt_approx_squared,reduce_arg;浮点argf,近似,近似f_err;int,t,f;/*处理正常情况*/if((a> = 0)&&(a< INFINITY)){/*计算指数调整*/b = frexp(a,& e);t = e-2 * 512;f = t/2;t = t-2 * f;f = f + 512;/*将参数映射到主要近似区间[0.25,1)*/reduce_arg = ldexp(b,t);/*计算初始低精度近似值*/argf =(float)reduced_arg;大约= qseedf(argf);/*应用两个具有二次收敛性的Newton-Raphson迭代*/roxf_err = fmaf(-argf,大约* *大约1.0f);近似值= fmaf(0.5f *近似值,roxf_err,近似值);roxf_err = fmaf(-argf,大约* *大约1.0f);近似值= fmaf(0.5f *近似值,roxf_err,近似值);/* rsqrt近似现在精确到1个单精度ulp */rsqrt_approx =(双精度)approxf;/*执行三次收敛的哈雷迭代.基于工作彼得·马克斯坦(Peter Markstein)的作品.参见:Peter Markstein,"IA-64和基础"功能",Prentice Hall 2000*/rsqrt_approx_squared = rsqrt_approx * rsqrt_approx;rsqrt_approx_err = fma(-reduced_arg,rsqrt_approx_squared,1.0);excellent_rsqrt_approx = fma(fma(rsqrt_approx_err,three_eighth,half),rsqrt_approx * rsqrt_approx_err,rsqrt_approx);sqrt_approx =减少的参数* *精制的rsqrt_approx;sqrt_residual = fma(-sqrt_approx,sqrt_approx,reduce_arg);结果= fma(sqrt_residual,一半* fined_rsqrt_approx,sqrt_approx);/*通过干扰指数从主近似间隔映射回*/结果= ldexp(结果,f);} 别的 {/*处理特殊情况*/结果=(a< 0)?QNAN_INDEFINITE:(a + a);}返回结果;}/*https://groups.google.com/forum/#!original/comp.lang.c/qFv18ql_WlU/IK8KGZZFJx4J发件人:geo< gmars ... @ gmail.com>新闻组:sci.math,comp.lang.c,comp.lang.fortran主题:64位KISS RNG日期:2009年2月28日,星期六,04:30:48 -0800(PST)这个64位的KISS RNG具有三个组成部分,每个组成部分足以独自服务.这些组件是:乘载乘积(MWC),期间(2 ^ 121 + 2 ^ 63-1)Xorshift(XSH),期间2 ^ 64-1同余(CNG),期间2 ^ 64*/静态uint64_t kiss64_x = 1234567890987654321ULL;静态uint64_t kiss64_c = 123456123456123456ULL;静态uint64_t kiss64_y = 362436362436362436ULL;静态uint64_t kiss64_z = 1066149217761810ULL;静态uint64_t kiss64_t;#define MWC64(kiss64_t =(kiss64_x<< 58)+ kiss64_c,\kiss64_c =(kiss64_x>> 6),kiss64_x + = kiss64_t,\kiss64_c + =(kiss64_x< kiss64_t),kiss64_x)#定义XSH64(kiss64_y ^ =(kiss64_y<< 13),kiss64_y ^ =(kiss64_y>>> 17),\kiss64_y ^ =(kiss64_y<< 43))#定义CNG64(kiss64_z = 6906969069ULL * kiss64_z + 1234567ULL)#定义KISS64(MWC64 + XSH64 + CNG64)int main(无效){const uint64_t N = 10000000000ULL;/*所需的测试用例数*/double arg,ref,res;uint64_t argi,refi,resi,计数= 0;double spec [] = {0,1,INFINITY,NAN};printf(测试一些特殊情况:\ n");for(int i = 0; i< sizeof(spec)/sizeof(spec [0]); i ++){printf("my_sqrt(%22.13a)=%22.13a \ n",spec [i],my_sqrt(spec [i]));printf("my_sqrt(%22.13a)=%22.13a \ n",-spec [i],my_sqrt(-spec [i]));}printf(测试%llu个随机情况:\ n",N);做 {数++;阿尔吉= KISS64;memcpy(& arg,& argi,sizeof arg);res = my_sqrt(arg);ref = sqrt(arg);memcpy(& resi,& res,sizeof resi);memcpy(& refi,& ref,sizeof refi);如果(resi!= refi){printf("\ rerror @ arg =%22.13a res =%22.13a ref =%22.13a \ n",arg,res,ref);返回EXIT_FAILURE;}if((count& 0xfffff)== 0)printf("\ r [%llu]",count);} while(count< N);printf("\ r [%llu]",计数);printf("\ ntestsPASSED \ n");返回EXIT_SUCCESS;} 

以上程序的输出应类似于以下内容:

 测试一些特殊情况:my_sqrt(0x0.0000000000000p + 0)= 0x0.0000000000000p + 0my_sqrt(-0x0.0000000000000p + 0)= -0x0.0000000000000p + 0my_sqrt(0x1.0000000000000p + 0)= 0x1.0000000000000p + 0my_sqrt(-0x1.0000000000000p + 0)= -0x1.#IND000000000p + 0my_sqrt(0x1.#INF000000000p + 0)= 0x1.#INF000000000p + 0my_sqrt(-0x1.#INF000000000p + 0)= -0x1.#IND000000000p + 0my_sqrt(0x1.#QNAN00000000p + 0)= 0x1.#QNAN00000000p + 0my_sqrt(-0x1.#QNAN00000000p + 0)= -0x1.#QNAN00000000p + 0测试10000000000个随机案例:[10000000000]通过测试 

I'm trying to implement the double __ieee754_sqrt(double x) function which uses hardware instruction to obtain the 1st approximation:

double __ieee754_sqrt(double x) {
    double z;
    /* get reciprocal of the square root (6.75 bits accuracy) */
    __asm(" QSEED.DF %0,%1 \n": "=e" (z):"e" (x):);
    z = 1 / z;
    z = ( z + x / z) / 2; /* 1st Newton-Raphson iteration */
    z = ( z + x / z) / 2; /* 2nd Newton-Raphson iteration */
    z = ( z + x / z) / 2; /* 3rd Newton-Raphson iteration */
    z = ( z + x / z) / 2; /* 4th Newton-Raphson iteration */
    return z;
}

However, paranoia.c (link, link) test complains:

Square root is neither chopped nor correctly rounded.
Observed errors run from -6.0493828e-01 to 5.0000000e-01 ulps. 

Question: how to implement additional logic for chopping and correct rounding?

UPD. The hardware does not natively support sqrt(). The hardware supports only obtaining of the reciprocal of the square root (6.75 bits accuracy).

UPD2.

  1. Used njuffa's solution (many thanks!) with minor changes: use qseeddf() instead of qseedf() => use fma() instead of fmaf(). Why? Because it omits double<=>float conversions and hence faster.
  2. Yes, fused multiply-add instructions (FMA) are supported by the hardware.
  3. Thanks to all for participating in the discussion and for the detailed answers!
  4. To all interested in the topic, here is the list of sqrt() implementations:

    1. From Cygwin math. library (libm): cygwin-snapshot-20200710-1/newlib/libm/math/e_sqrt.c: copyrighted Copyright (C) 1993 by Sun Microsystems.
    2. From GNU C library (glibc):

      1. glibc-2.31/sysdeps/ieee754/dbl-64/e_sqrt.c: entitled IBM Accurate Mathematical Library.
      2. glibc-2.31/sysdeps/powerpc/fpu/e_sqrt.c: using __builtin_fma() functions.

解决方案

Before embarking on the construction of one's own implementation, it is advisable to search the internet to check if suitable and well-tested open-source code is available.

Common iterative algorithms use division-free iterations for the reciprocal square root to the desired accuracy, back-multiply with the argument to compute the square root, and finally round using the desired rounding mode. Iterations for the reciprocal square root can use either Newton-Raphson iterations with quadratic convergence (roughly doubling the number of correct bits) or Halley iterations with cubic convergence (roughly tripling the number of correct bits). While higher-order iterations exist, they are typically not used.

To keep the code simple, it is advisable to reduce the argument to a single narrow interval comprising two consecutive binades in the case of binary floating-point arithmetic. Note that this generally does not result in the highest performance implementation due to the need for exponent manipulation. For performance reasons, the initial iteration(s) for a double-precision implementation are often performed in single precision.

In the exemplary ISO-C99 implementation below I am showing how a correctly rounded double-precision square root can be implemented along those lines. I am assuming that double maps to IEEE-754 binary64 and that float maps to IEEE-754 binary32. I am restricting to a sqrt implemented with IEEE-754 round-to-nearest-or-even mode.

Very importantly I am assuming that the processor hardware provides fused multiply-add instructions and that these are correctly exposed via the standard math library functions fmaf and fma. In comments I had asked for clarification from OP as to the availability of FMA, but decided to get started on the code before feedback was available. Implementations without FMA are possible but much more challenging, and a sufficiently complete treatment would likely exceed the space of a Stackoverflow answer.

Since OP did not specify the target architecture or provide details of the starting approximation, I am using my own starting approximation below based on a polynomial minimax approximation on the interval [0.25, 1] to which all non-exceptional arguments are reduced. qseedf() results are accurate to about 7 bit, so slightly better than OP's built-in hardware. Whether this difference is significant, I cannot assess.

The algorithm, in particular the rounding logic, relies on the ideas of Peter Markstein, therefore I am reasonably confident that the algorithm is correct by construction. I have implemented only very rudimentary testing here. Best industry practice is to mathematically prove the correctness of such algorithms, see publications by David Russinoff and John Harrison for example. In a pinch, one might be able to get away with an exhaustive test across two consecutive binades (feasible these days with a small cluster running for a few days), coupled with random and pattern-based tests exercising all binades.

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <string.h>
#include <math.h>

/* Approximate 1/sqrt(a) on [0.25, 1] with an accuracy of about 7 bits */
float qseedf (float a)
{
    float r;

    r =             -2.43845296f;
    r = fmaf (r, a,  6.22994471f);
    r = fmaf (r, a, -5.91090727f);
    r = fmaf (r, a,  3.11237526f);
    return r;
}

double my_sqrt (double a)
{    
    const double QNAN_INDEFINITE = 0.0 / 0.0;
    const double half = 0.5;
    const double three_eighth = 0.375;
    double refined_rsqrt_approx, sqrt_approx, sqrt_residual, result, b;
    double rsqrt_approx, rsqrt_approx_err, rsqrt_approx_squared, reduced_arg;
    float argf, approxf, approxf_err;
    int e, t, f;

    /* handle normal cases */
    if ((a >= 0) && (a < INFINITY)) {
        /* compute exponent adjustments */
        b = frexp (a, &e);
        t = e - 2*512;
        f = t / 2;
        t = t - 2 * f;
        f = f + 512;

        /* map argument into the primary approximation interval [0.25,1) */
        reduced_arg = ldexp (b, t);
        
        /* Compute initial low-precision approximation */
        argf = (float)reduced_arg;
        approxf = qseedf (argf);
        
        /* Apply two Newton-Raphson iterations with quadratic convergence */
        approxf_err = fmaf (-argf, approxf * approxf, 1.0f);
        approxf = fmaf (0.5f * approxf, approxf_err, approxf);
        approxf_err = fmaf (-argf, approxf * approxf, 1.0f);
        approxf = fmaf (0.5f * approxf, approxf_err, approxf);
        
        /* rsqrt approximation is now accurate to 1 single-precision ulp */
        rsqrt_approx = (double)approxf;

        /* Perform a Halley iteration wih cubic convergence. Based on the work
           of Peter Markstein. See: Peter Markstein, "IA-64 and Elementary 
           Functions", Prentice Hall 2000
        */
        rsqrt_approx_squared = rsqrt_approx * rsqrt_approx;
        rsqrt_approx_err = fma (-reduced_arg, rsqrt_approx_squared, 1.0);
        refined_rsqrt_approx = fma (fma (rsqrt_approx_err, three_eighth, half), 
                                rsqrt_approx * rsqrt_approx_err, rsqrt_approx);
        sqrt_approx = reduced_arg * refined_rsqrt_approx;
        sqrt_residual = fma (-sqrt_approx, sqrt_approx, reduced_arg);
        result = fma (sqrt_residual, half * refined_rsqrt_approx, sqrt_approx);

        /* map back from primary approximation interval by jamming exponent */
        result = ldexp (result, f);
    } else {
        /* handle special cases */
        result = (a < 0) ? QNAN_INDEFINITE : (a + a);
    }
    return result;
}

/*
  https://groups.google.com/forum/#!original/comp.lang.c/qFv18ql_WlU/IK8KGZZFJx4J
  From: geo <gmars...@gmail.com>
  Newsgroups: sci.math,comp.lang.c,comp.lang.fortran
  Subject: 64-bit KISS RNGs
  Date: Sat, 28 Feb 2009 04:30:48 -0800 (PST)

  This 64-bit KISS RNG has three components, each nearly
  good enough to serve alone.    The components are:
  Multiply-With-Carry (MWC), period (2^121+2^63-1)
  Xorshift (XSH), period 2^64-1
  Congruential (CNG), period 2^64
*/
static uint64_t kiss64_x = 1234567890987654321ULL;
static uint64_t kiss64_c = 123456123456123456ULL;
static uint64_t kiss64_y = 362436362436362436ULL;
static uint64_t kiss64_z = 1066149217761810ULL;
static uint64_t kiss64_t;
#define MWC64  (kiss64_t = (kiss64_x << 58) + kiss64_c, \
                kiss64_c = (kiss64_x >> 6), kiss64_x += kiss64_t, \
                kiss64_c += (kiss64_x < kiss64_t), kiss64_x)
#define XSH64  (kiss64_y ^= (kiss64_y << 13), kiss64_y ^= (kiss64_y >> 17), \
                kiss64_y ^= (kiss64_y << 43))
#define CNG64  (kiss64_z = 6906969069ULL * kiss64_z + 1234567ULL)
#define KISS64 (MWC64 + XSH64 + CNG64)

int main (void)
{
    const uint64_t N = 10000000000ULL; /* desired number of test cases */
    double arg, ref, res;
    uint64_t argi, refi, resi, count = 0;
    double spec[] = {0, 1, INFINITY, NAN};

    printf ("test a few special cases:\n");
    for (int i = 0; i < sizeof (spec)/sizeof(spec[0]); i++) {
        printf ("my_sqrt(%22.13a) = %22.13a\n", spec[i], my_sqrt(spec[i]));
        printf ("my_sqrt(%22.13a) = %22.13a\n", -spec[i], my_sqrt(-spec[i]));
    }
    
    printf ("test %llu random cases:\n", N);
    do {
        count++;
        argi = KISS64;
        memcpy (&arg, &argi, sizeof arg);
        res = my_sqrt (arg);
        ref = sqrt (arg);
        memcpy (&resi, &res, sizeof resi);
        memcpy (&refi, &ref, sizeof refi);
        if (resi != refi) {
            printf ("\rerror @ arg=%22.13a  res=%22.13a  ref=%22.13a\n",
                    arg, res, ref);
            return EXIT_FAILURE;
        }
        if ((count & 0xfffff) == 0) printf ("\r[%llu]", count);
    } while (count < N);
    printf ("\r[%llu]", count);
    printf ("\ntests PASSED\n");
    return EXIT_SUCCESS;
}

The output of the above program should look similar to this:

test a few special cases:
my_sqrt(  0x0.0000000000000p+0) =   0x0.0000000000000p+0
my_sqrt( -0x0.0000000000000p+0) =  -0x0.0000000000000p+0
my_sqrt(  0x1.0000000000000p+0) =   0x1.0000000000000p+0
my_sqrt( -0x1.0000000000000p+0) =  -0x1.#IND000000000p+0
my_sqrt(  0x1.#INF000000000p+0) =   0x1.#INF000000000p+0
my_sqrt( -0x1.#INF000000000p+0) =  -0x1.#IND000000000p+0
my_sqrt(  0x1.#QNAN00000000p+0) =   0x1.#QNAN00000000p+0
my_sqrt( -0x1.#QNAN00000000p+0) =  -0x1.#QNAN00000000p+0
test 10000000000 random cases:
[10000000000]
tests PASSED

这篇关于符合IEEE 754的double类型的sqrt()实现的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆