用C计算互补误差函数erfcinv()的逆 [英] Computing the inverse of the complementary error function, erfcinv(), in C

查看:286
本文介绍了用C计算互补误差函数erfcinv()的逆的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

ISO C99引入了erf()erfc()函数以及其特定于精度的类似物,但莫名其妙地也没有添加它们的反函数.互补误差函数的反函数erfcinv在统计和概率计算中特别有用,例如,作为标准正态分布CDF的反函数normcdfinv的基础.

ISO C99 introduced the erf() and erfc() functions along with their precision-specific analogues, but inexplicably didn't add their inverses as well. The inverse of the complementary error function, erfcinv, in particular is useful in statistical and probability computations, for example as a building block for the inverse of the CDF of the standard normal distribution, normcdfinv.

我检查了ISO/IEC JTC1/的文档档案 SC22/WG14 C工作组,目前在ISO C18中仍不支持erfcinv(),似乎也没有计划将其添加到ISO C2x中.

I checked the document archive of the ISO/IEC JTC1/SC22/WG14 C working group and there is currently still no support for erfcinv() in ISO C18, nor do there appear to be any plans to add it in ISO C2x which is currently being worked on.

C程序员可以创建自己的实现.准确地计算erfcinv()具有挑战性,而在数学上erfcinv (x) = erfinv (1 - x)时,当x接近零时,在有限精度浮点算术中其数值属性较差.实现4 ulp的误差范围,与Intel向量数学库中的LA配置文件的规格相同,似乎是一个合理的精度目标.如果计算允许大量使用由普通处理器体系结构(例如x86-64,ARM64,PowerPC和GPU)提供的融合乘加运算(FMA),则也将是有利的.最后,如果该算法适合某些体系结构中可用的快速倒数或倒数平方根计算,将很有用.

C programmers are left to create their own implementations. Computing erfcinv() accurately is challenging, and while mathematically erfcinv (x) = erfinv (1 - x), this has poor numerical properties in finite-precision floating-point arithmetic when x is close to zero. Achieving a 4 ulp error bound, identical to the specification of the LA profile in Intel's vector math library seems a reasonable accuracy goal. It would also be advantageous if the computation allows for copious use of the fused multiply-add operation (FMA) provided by common processor architectures, such as x86-64, ARM64, PowerPC, and GPUs. Finally, it would be useful if the algorithm is amenable to the use of fast reciprocal or reciprocal square root computation available with some architectures.

几年前,我在《应用晶体学杂志》上的一篇论文中找到了合适的人选.通过纯单精度计算,使用FMA和快速倒数,我能够实现3.18707 ulp的最大误差.只是后来才发现,本文的实现是基于ACM算法602使用的Fortran子程序MERFI的C转换:

Years ago, I identified a suitable candidate in a paper in the Journal of Applied Crystallography. With pure single-precision computation, use of FMA and fast reciprocal, I was able to achieve a maximum error of 3.18707 ulp. Only belatedly did I discover that the paper's implementation was based on a C translation of a Fortran subroutine MERFI used by ACM algorithm 602:

Theodore Fessler,William F. Ford,David A. Smith,"HURRY:标量序列和序列的加速算法",数学软件上的ACM交易,第1卷. 1983年9月,第9号,第3号, pp.355-357(在线)

Theodore Fessler, William F. Ford, David A. Smith, "HURRY: An Acceleration Algorithm for Scalar Sequences and Series", ACM Transactions on Mathematical Software, Vol. 9, No. 3, September 1983, pp. 355-357 (online)

不幸的是,算法602的 netlib的软件包显示了MERFI的版权声明:"1978年由IMSL,INC.保留所有权利"以及明确规定的限制,即不得提取IMSL代码以用于其他软件开发.

Unfortunately, netlib's software package for algorithm 602 shows a copyright notice for MERFI: "1978 BY IMSL, INC. ALL RIGHTS RESERVED" along with the explicitly stated restriction that IMSL code may not be extracted for use in other software development.

在考虑到上述设计目标的情况下,什么是实现单精度erfcinvf()的良好算法?

What would be a good algorithm for the implementation of single-precision erfcinvf(), under consideration of the design goals stated above?

推荐答案

前一段时间,我发布了答案,其中显示了如何用C编程逆向误差函数erfinvf()的准确,高效的单精度实现.虽然数字问题使我们无法在其整个受支持的输入域[0,2]上从erfinvf计算erfcinvf.可以通过对输入参数进行简单的转换来重用中心区域[2.1875e-3,1.998125]中对erfcinvf()的先前答案中的核心近似.这样会添加错误的 ulp 不到一半.

Some while ago, I posted an answer showing how one can program an accurate and efficient single-precision implementation of the inverse error function erfinvf() in C. While numerical issues prevent us from computing erfcinvf from erfinvf over its entire supported input domain [0,2], we can re-use on of the core approximations in the previous answer for erfcinvf() in the central region [2.1875e-3, 1.998125] with a simple transformation to the input arguments. This adds less than half an ulp of error.

关于erfcinvf()尾巴的计算,我们观察到 对称性使我们可以将近似值限制为[0,2.1875e-3),并且对于此间隔erfcinv(x)非常近似sqrt (-log(x)).使用它来转换输入自变量会产生一个相当容易用多项式近似的函数.可以借助Remez算法(如我在此处所做的)或使用Maple或Mathematica等通用数学软件来制作minimax近似值.

With regard to the computation of the erfcinvf() tails we observe that functional symmetry allows us to limit our approximation to [0, 2.1875e-3) and that for this interval erfcinv(x) is very roughly sqrt (-log(x)). Using this to transform the input arguments results in a function that is reasonably easy to approximate with a polynomial. The minimax approximation can be crafted with the help of the Remez algorithm (as I have done here) or with common mathematical software like Maple or Mathematica.

在大多数平台上,sqrtf()直接映射到硬件指令,该指令根据IEEE-754或等效的软件仿真来计算正确取整的平方根.在logf()的实现之间通常存在更多的可变性,因此在下面的示例C代码中,我添加了自己的实现my_logf()以帮助实现可重现性.但是,任何logf()的全面实现,即误差范围为1 ulp的实现,都应产生与以下erfcinvf()代码非常相似的精度,从而实现3.13 ulp误差范围.除了普通的C实现之外,我还将演示如何利用SSE的倒数平方根和倒数平方根功能来实现erfcinvf().

On most platforms, sqrtf() maps directly to a hardware instruction that computes a correctly-rounded square root according to IEEE-754, or an equivalent software emulation. There is typically more variability between implementations of logf(), so in the exemplary C code below I have added my own implementation my_logf() to aid reproducability. However, any faithfully-rounded implementation of logf(), i.e. one with an error bound of 1 ulp, should result in very similar accuracy as in the erfcinvf() code below, which achieves a 3.13 ulp error bound. In addition to the plain C implementation, I am also demonstrating how to utilize the reciprocal and reciprocal square root capabilities of SSE for the implementation of erfcinvf().

#include <math.h>
#include <stdint.h>

#define PORTABLE  (1)

float my_logf (float a);
#if !PORTABLE
#include "immintrin.h"
float sse_recipf (float a);
float sse_rsqrtf (float a);
#endif // !PORTABLE

/* Compute inverse of the complementary error function. For the central region,
   re-use the polynomial approximation for erfinv. For the tail regions, use an
   approximation based on the observation that erfcinv(x) is very approximately
   sqrt(-log(x)).

   PORTABLE=1 max. ulp err. = 3.12017
   PORTABLE=0 max. ulp err. = 3.13523
*/
float my_erfcinvf (float a)
{
    float r;

    if ((a >= 2.1875e-3f) && (a <= 1.998125f)) { // max. ulp err. = 2.77667
        float p, t;
        t = fmaf (-a, a, a + a);
        t = my_logf (t);
        p =              5.43877832e-9f;  //  0x1.75c000p-28 
        p = fmaf (p, t,  1.43286059e-7f); //  0x1.33b458p-23 
        p = fmaf (p, t,  1.22775396e-6f); //  0x1.49929cp-20 
        p = fmaf (p, t,  1.12962631e-7f); //  0x1.e52bbap-24 
        p = fmaf (p, t, -5.61531961e-5f); // -0x1.d70c12p-15 
        p = fmaf (p, t, -1.47697705e-4f); // -0x1.35be9ap-13 
        p = fmaf (p, t,  2.31468701e-3f); //  0x1.2f6402p-9 
        p = fmaf (p, t,  1.15392562e-2f); //  0x1.7a1e4cp-7 
        p = fmaf (p, t, -2.32015476e-1f); // -0x1.db2aeep-3 
        t = fmaf (p, t,  8.86226892e-1f); //  0x1.c5bf88p-1 
        r = fmaf (t, -a, t);
    } else {
        float p, q, s, t;
        t = (a >= 1.0f) ? (2.0f - a) : a;
        t = 0.0f - my_logf (t);
#if PORTABLE
        s = sqrtf (1.0f / t);
#else // PORTABLE
        s = sse_rsqrtf (t);
#endif // PORTABLE
        p =              2.23100796e+1f;  //  0x1.64f616p+4
        p = fmaf (p, s, -5.23008537e+1f); // -0x1.a26826p+5
        p = fmaf (p, s,  5.44409714e+1f); //  0x1.b3871cp+5
        p = fmaf (p, s, -3.35030403e+1f); // -0x1.0c063ap+5
        p = fmaf (p, s,  1.38580027e+1f); //  0x1.bb74c2p+3
        p = fmaf (p, s, -4.37277269e+0f); // -0x1.17db82p+2
        p = fmaf (p, s,  1.53075826e+0f); //  0x1.87dfc6p+0
        p = fmaf (p, s,  2.97993328e-2f); //  0x1.e83b76p-6
        p = fmaf (p, s, -3.71997419e-4f); // -0x1.86114cp-12
        p = fmaf (p, s, s);
#if PORTABLE
        r = 1.0f / p;
#else // PORTABLE
        r = sse_recipf (p);
        if (t == INFINITY) r = t;
#endif // PORTABLE
        if (a >= 1.0f) r = 0.0f - r;
    }
    return r;
}

/* Compute inverse of the CDF of the standard normal distribution.
   max ulp err = 4.08385
*/
float my_normcdfinvf (float a)
{
    return fmaf (-1.41421356f, my_erfcinvf (a + a), 0.0f);
}

/* natural logarithm. max ulp err = 0.85089 */
float my_logf (float a)
{
    float m, r, s, t, i, f;
    int32_t e;
    const float cutoff_f = 0.666666667f;
    if ((a > 0.0f) && (a <=  0x1.fffffep+127f)) { // 3.40282347e+38
        m = frexpf (a, &e);
        if (m < cutoff_f) {
            m = m + m;
            e = e - 1;
        }
        i = (float)e;
        f = m - 1.0f;
        s = f * f;
        /* Compute log1p(f) for f in [-1/3, 1/3] */
        r =             -0x1.0ae000p-3f;  // -0.130310059
        t =              0x1.208000p-3f;  //  0.140869141
        r = fmaf (r, s, -0x1.f1988ap-4f); // -0.121483363
        t = fmaf (t, s,  0x1.1e5740p-3f); //  0.139814854
        r = fmaf (r, s, -0x1.55b36ep-3f); // -0.166846141
        t = fmaf (t, s,  0x1.99d8b2p-3f); //  0.200120345
        r = fmaf (r, s, -0x1.fffe02p-3f); // -0.249996200
        r = fmaf (t, f, r);
        r = fmaf (r, f,  0x1.5554fap-2f); //  0.333331972
        r = fmaf (r, f, -0x1.000000p-1f); // -0.500000000
        r = fmaf (r, s, f);
        r = fmaf (i, 0x1.62e430p-01f, r); //  0.693147182 // log(2)
    } else {
        r = a + a;  // silence NaNs if necessary
        if (a  < 0.0f) r = 0.0f/0.0f; // QNaN INDEFINITE
        if (a == 0.0f) r = -INFINITY; // -INF
    }
    return r;
}

float sse_recipf (float a)
{
    __m128 t;
    float e, r;
    t = _mm_set_ss (a);
    t = _mm_rcp_ss (t);
    _mm_store_ss (&r, t);
    e = fmaf (0.0f - a, r, 1.0f);
    e = fmaf (e, e, e);
    r = fmaf (e, r, r);
    return r;
}


float sse_rsqrtf (float a)
{
    __m128 t;
    float e, r;
    t = _mm_set_ss (a);
    t = _mm_rsqrt_ss (t);
    _mm_store_ss (&r, t);
    e = fmaf (0.0f - a, r * r, 1.0f);
    r = fmaf (fmaf (0.375f, e, 0.5f), e * r, r);
    return r;
}

这篇关于用C计算互补误差函数erfcinv()的逆的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆