其中幂算法做CPU /编程语言使用? [英] Which exponentiation algorithms do CPU/programming languages use?

查看:198
本文介绍了其中幂算法做CPU /编程语言使用?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在学习更快幂算法(k元,推拉门等),并想知道哪些在的CPU /编程语言使用? (我模糊不管这发生在CPU或通过编译)

I've been learning about faster exponentiation algorithms (k-ary, sliding door etc.), and was wondering which ones are used in CPUs/programming languages? (I'm fuzzy on whether or not this happens in the CPU or through the compiler)

和只是踢,这是最快的?

And just for kicks, which is the fastest?

对于广度编辑:这是故意宽泛,因为我知道有一堆不同的技术来做到这一点。选中的答案有什么,我一直在寻找。

Edit regarding the broadness: It's intentionally broad because I know there are a bunch of different techniques to do this. The checked answer had what I was looking for.

推荐答案

我假设你的兴趣是在执行的,可以在标准的数学库中找到的高级语言,特别是C / C ++的幂函数。其中包括功能 EXP() EXP2() exp10() POW(),以及单precision同行 expf() exp2f() exp10f()函数powf()

I assume your interest is in implementation of the exponentiation functions that can be found in standard math libraries for HLLs, in particular C/C++. These include the functions exp(), exp2(), exp10(), and pow(), as well as single-precision counterparts expf(), exp2f(), exp10f(), and powf().

你提到的求幂的方法(如k路,滑动窗口)中通常采用的加密算法,例如RSA,这是基于幂。他们通常不用于通过文件math.h 提供的幂函数或 CMATH 。的实施细节标准的数学函数,如 EXP()不同,但一个共同的方案遵循三个步骤:

The exponentiation methods you mention (such as k-ary, sliding window) are typically employed in cryptographic algorithms, such as RSA, which is exponentiation based. They are not typically used for the exponentiation functions provided via math.h or cmath. The implementation details for standard math functions like exp() differ, but a common scheme follows a three-step process:

  1. 降低函数参数到主逼近 间隔
  2. 在合适的基础功能上的主逼近区间逼近
  3. 映射回结果用于主间隔的函数的整个范围
  1. reduction of the function argument to a primary approximation interval
  2. approximation of a suitable base function on the primary approximation interval
  3. mapping back the result for the primary interval to the entire range of the function

这是辅助步骤往往是特殊情况的处理。这些都涉及到特殊的数学情况,如日志(0.0),或特殊的浮点操作数如NaN(非数字)。

An auxiliary step is often the handling of special cases. These can pertain to special mathematical situations such as log(0.0), or special floating-point operands such as NaN (Not a Number).

在C99 $ C $下 expf(浮点)下面示范的方式显示了这些步骤看起来像一个具体的例子。参数 A 是第一次分裂,使得 EXP(一) = E 研究 * 2 ,其中是一个整数,研究在[日志(开方(0.5 ),日志(SQRT(2.0)],初级近似时间间隔。在第二步骤中,我们现在近似Ë研究使用的多项式。这种近似可根据各种设计标准,如最小化绝对被设计或相对误差多项式可以以各种方式,包括霍纳的计划和雌激素的方案进行评估。

The C99 code for expf(float) below shows in exemplary fashion what those steps look like for a concrete example. The argument a is first split such that exp(a) = er * 2i, where i is an integer and r is in [log(sqrt(0.5), log(sqrt(2.0)], the primary approximation interval. In the second step, we now approximate er with a polynomial. Such approximations can be designed according to various design criteria such as minimizing absolute or relative error. The polynomial can be evaluated in various ways including Horner's scheme and Estrin's scheme.

在code以下使用通过采用极小极大近似,这在整个逼近区间最小化最大误差一种很常见的做法。一个标准的算法计算这种近似是雷米兹算法。评估是通过霍纳的计划;本次评测的数值精度是通过使用 fmaf增强()

The code below uses a very common approach by employing a minimax approximation, which minimizes the maximum error over the entire approximation interval. A standard algorithm for computing such approximations is the Remez algorithm. Evaluation is via Horner's scheme; the numerical accuracy of this evaluation is enhanced by the use of fmaf().

本标准数学函数实现所谓的一乘加或FMA。这种计算 A * B + C 使用完整的产品 A * B 添加和应用单一的四舍五入在年底期间。在大多数现代的硬件,如图形处理器,IBM的Power的CPU,最近x86处理器(例如哈斯韦尔),最近的ARM处理器(作为可选的扩展),该映射直到一个硬件指令。在缺乏这样的指令平台, fmaf()将映射到相当缓慢的仿真code,在这种情况下,我们不希望使用它,如果大家有兴趣性能。

This standard math function implements what is known as a fused multiply-add or FMA. This computes a*b+c using the full product a*b during addition and applying a single rounding at the end. On most modern hardware, such as GPUs, IBM Power CPUs, recent x86 processors (e.g. Haswell), recent ARM processors (as an optional extension), this maps straight to a hardware instruction. On platforms that lack such an instruction, fmaf() will map to fairly slow emulation code, in which case we would not want to use it if we are interested in performance.

最后计算是乘以2 ,为此,C和C ++提供的功能 ldexp()。在工业强度库code人们通常采用一机专用的成语在这里,需要使用IEEE-754二进制算术的优势,为浮动。最后,code清理溢和下溢的情况下。

The final computation is the multiplication by 2i, for which C and C++ provide the function ldexp(). In "industrial strength" library code one typically uses a machine-specific idiom here that takes advantage of the use of IEEE-754 binary arithmetic for float. Lastly, the code cleans up cases of overflow and underflow.

x86处理器内部的的x87 FPU有一条指令 F2XM1 ,计算2 X -1 [-1,1]。这可以用于的计算EXP的第二个步骤() EXP2()。有 FSCALE 的指令,用来繁殖BY2 第三步。实施 F2XM1 本身的一个常用方法是微code,利用合理的或多项式逼近。需要注意的是使用x87 FPU保持多为传统支持这些天。在现代的x86平台库通常使用基于SSE和类似如下所示的算法纯软件实现。结合一些小的表与多项式逼近。

The x87 FPU inside x86 processors has an instruction F2XM1 that computes 2x-1 on [-1,1]. This can be used for second step of the computation of exp() and exp2(). There is an instruction FSCALE which is used to multiply by2i in the third step. A common way of implementing F2XM1 itself is as microcode that utilizes a rational or polynomial approximation. Note that the x87 FPU is maintained mostly for legacy support these days. On modern x86 platform libraries typically use pure software implementations based on SSE and algorithms similar to the one shown below. Some combine small tables with polynomial approximations.

POW(X,Y)从概念上可以实现为 EXP(Y *日志(X)),但患有显著精度损失时 X 是靠近大的幅度,以及不正确的团结和处理的C / C ++标准规定的许多特殊情况。让周围的准确性问题的一种方法是计算日志(X),产品 Y *日志(X))在某种形式的扩展precision。细节将填补整个冗长独立的答案,而我没有code方便进行论证。在不同的C / C ++数学库, POW(双,INT)函数powf(浮动,INT)是由计算适用的方和乘法的整型指数的二进制重新presentation逐位扫描一个单独的code路径。

pow(x,y) can be conceptually implemented as exp(y*log(x)), but this suffers from significant loss of accuracy when x is near unity and y in large in magnitude, as well as incorrect handling of numerous special cases specified in the C/C++ standards. One way to get around the accuracy issue is to compute log(x) and the product y*log(x)) in some form of extended precision. The details would fill an entire, lengthy separate answer, and I do not have code handy to demonstrate it. In various C/C++ math libraries, pow(double,int) and powf(float, int) are computed by a separate code path that applies the "square-and-multiply" method with bit-wise scanning of the the binary representation of the integer exponent.

#include <math.h> /* import fmaf(), ldexpf() */

/* Like rintf(), but -0.0f -> +0.0f, and |a| must be < 2**22 */
float quick_and_dirty_rintf (float a)
{
    float cvt_magic = 0x1.800000p+23f;
    return (a + cvt_magic) - cvt_magic;
}

/* Approximate exp(a) on the interval [log(sqrt(0.5)), log(sqrt(2.0))]. 
   Maximum ulp error = 0.70951
*/
float expf_poly (float a)
{ 
    float r;

    r =             0x1.6ab95ep-10f;
    r = fmaf (r, a, 0x1.126d28p-07f);
    r = fmaf (r, a, 0x1.5558f8p-05f);
    r = fmaf (r, a, 0x1.55543ap-03f);
    r = fmaf (r, a, 0x1.fffffap-02f);
    r = fmaf (r, a, 0x1.000000p+00f);
    r = fmaf (r, a, 0x1.000000p+00f);
    return r;
}

/* Approximate exp2() on interval [-0.5,+0.5]
   Maximum ulp error = 0.79961 
*/
float exp2f_poly (float a)
{ 
    float r;

    r =             0x1.4308f2p-13f;
    r = fmaf (r, a, 0x1.5f0722p-10f);
    r = fmaf (r, a, 0x1.3b2bd4p-07f);
    r = fmaf (r, a, 0x1.c6af80p-05f);
    r = fmaf (r, a, 0x1.ebfbdep-03f);
    r = fmaf (r, a, 0x1.62e430p-01f);
    r = fmaf (r, a, 0x1.000000p+00f);
    return r;
}

/* Approximate exp10(a) on [log(sqrt(0.5))/log(10), log(sqrt(2.0))/log(10)].
   Maximum ulp error = 0.77879
*/
float exp10f_poly (float a)
{ 
    float r;

    r =             0x1.a65694p-3f;
    r = fmaf (r, a, 0x1.158a00p-1f);
    r = fmaf (r, a, 0x1.2bda9ap+0f);
    r = fmaf (r, a, 0x1.046f72p+1f);
    r = fmaf (r, a, 0x1.535248p+1f);
    r = fmaf (r, a, 0x1.26bb1cp+1f);
    r = fmaf (r, a, 0x1.000000p+0f);
    return r;
}

/* Compute exponential base e. Maximum ulp error = 0.88790 */
float my_expf (float a)
{
    float t, r;
    int i;

    t = a * 0x1.715476p+0f;           // 1/log(2)
    t = quick_and_dirty_rintf (t);
    i = (int)t;
    r = fmaf (t, -0x1.62e430p-1f, a); // log_2_hi
    r = fmaf (t, 0x1.05c610p-29f, r); // log_2_lo
    t = expf_poly (r);
    r = ldexpf (t, i);
    if (a < -105.0f) r = 0.0f;
    if (a >  105.0f) r = 1.0f/0.0f;   // +INF
    return r;
}

/* Compute exponential base 2. Maximum ulp error = 0.87922 */
float my_exp2f (float a)
{
    float t, r;
    int i;

    t = quick_and_dirty_rintf (a);
    i = (int)t;
    r = a - t;
    t = exp2f_poly (r);
    r = ldexpf (t, i);
    if (a < -152.0f) r = 0.0f;
    if (a >  152.0f) r = 1.0f/0.0f;   // +INF
    return r;
}

/* Compute exponential base 10. Maximum ulp error = 0.95588 */
float my_exp10f (float a)
{
    float r, t;
    int i;

    t = a * 0x1.a934f0p+1f;
    t = quick_and_dirty_rintf (t);
    i = (int)t;
    r = fmaf (t, -0x1.344136p-2f, a); // log10(2)_hi
    r = fmaf (t, 0x1.ec10c0p-27f, r); // log10(2)_lo
    t = exp10f_poly (r);
    r = ldexpf (t, i);
    if (a < -46.0f) r = 0.0f;
    if (a >  46.0f) r = 1.0f/0.0f;    // +INF
    return r;
}

这篇关于其中幂算法做CPU /编程语言使用?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆