快速精通计算:可提高精度,而不会失去太多的表现? [英] Fast Exp calculation: possible to improve accuracy without losing too much performance?

查看:601
本文介绍了快速精通计算:可提高精度,而不会失去太多的表现?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想指出,以前在描述的快速EXP(x)函数这个回答的太问题上提高计算速度在C#中:

I am trying out the fast Exp(x) function that previously was described in this answer to an SO question on improving calculation speed in C#:

public static double Exp(double x)
{
  var tmp = (long)(1512775 * x + 1072632447);
  return BitConverter.Int64BitsToDouble(tmp << 32);
}



该表达式使用某些IEEE浮点招数,并主要用于神经套使用。该功能比普通的 Math.Exp(X)函数更快的约5倍。相对于普通的 Math.Exp(X)

The expression is using some IEEE floating point "tricks" and is primarily intended for use in neural sets. The function is approximately 5 times faster than the regular Math.Exp(x) function.

不幸的是,数字精度只有-4% $ C>功能,理想情况下,我想至少子%的范围内有准确度。

Unfortunately, the numeric accuracy is only -4% -- +2% relative to the regular Math.Exp(x) function, ideally I would like to have accuracy within at least the sub-percent range.

我已经绘制的近似和常规精通功能之间的商,并且如在图中可以看出的相对差异似乎与几乎恒定的频率重复。

I have plotted the quotient between the approximate and the regular Exp functions, and as can be seen in the graph the relative difference appears to be repeated with practically constant frequency.

是否有可能采取这种规律性的优势来提高快速EXP功能的精确度进一步而不显着降低了计算速度,还是会的精度提高的计算开销超过原始表达式的计算收益?

Is it possible to take advantage of this regularity to improve the accuracy of the "fast exp" function further without substantially reducing the calculation speed, or would the computational overhead of an accuracy improvement outweigh the computational gain of the original expression?

(作为一个方面说明,我也尝试的 办法在相同的SO问题提出的替代之一,但这种方法似乎没有在C#计算效率,至少对于一般的情况下)。

(As a side note, I have also tried one of the alternative approaches proposed in the same SO question, but this approach does not seem to be computationally efficient in C#, at least not for the general case.)

更新5月14日

在从@Adriano的要求,我现在已经进行了很简单的基准。我已经使用每个替代的实施千万计算的 EXP 的浮点值的范围是[-100,100]功能。由于值的范围,我希望从-20到0跨度我也曾经明确列出在x的函数值= -5。下面是结果:

Upon request from @Adriano, I have now performed a very simple benchmark. I have performed 10 million computations using each of the alternative exp functions for floating point values in the range [-100, 100]. Since the range of values I am interested in spans from -20 to 0 I have also explicitly listed the function value at x = -5. Here are the results:

      Math.Exp: 62.525 ms, exp(-5) = 0.00673794699908547
Empty function: 13.769 ms
     ExpNeural: 14.867 ms, exp(-5) = 0.00675211846828461
    ExpSeries8: 15.121 ms, exp(-5) = 0.00641270968867667
   ExpSeries16: 32.046 ms, exp(-5) = 0.00673666189488182
          exp1: 15.062 ms, exp(-5) = -12.3333325982094
          exp2: 15.090 ms, exp(-5) = 13.708332516253
          exp3: 16.251 ms, exp(-5) = -12.3333325982094
          exp4: 17.924 ms, exp(-5) = 728.368055056781
          exp5: 20.972 ms, exp(-5) = -6.13293614238501
          exp6: 24.212 ms, exp(-5) = 3.55518353166184
          exp7: 29.092 ms, exp(-5) = -1.8271053775984
      exp7 +/-: 38.482 ms, exp(-5) = 0.00695945286970704

ExpNeural 的相当于本文的开头指定的精通的功能。的 ExpSeries8 的是,我原本声称并不在.NET非常有效的制定;正是像实施尼尔时,它实际上是非常快的。的 ExpSeries16 的是类似的公式,但有16次乘法而不是8的 EXP1 的通过的 exp7 的是从下阿德里亚诺的回答不同的功能。最后变种的 exp7 的是一个变体,其中的符号的 X 的检查;如果为负函数返回 1 / EXP(-x)代替。

ExpNeural is equivalent to the Exp function specified in the beginning of this text. ExpSeries8 is the formulation that I originally claimed was not very efficient on .NET; when implementing it exactly like Neil it was actually very fast. ExpSeries16 is the analogous formula but with 16 multiplications instead of 8. exp1 through exp7 are the different functions from Adriano's answer below. The final variant of exp7 is a variant where the sign of x is checked; if negative the function returns 1/exp(-x) instead.

不幸的是,的的阿德里亚诺上市EXPN 的功能在我正在考虑更广泛的负值范围足够了。由尼尔科菲的级数的方式似乎更适合在我的价值范围内,但它与过快扩张较大的负的 X 的,使用唯一8乘法时更是如此。

Unfortunately, neither of the expN functions listed by Adriano are sufficient in the broader negative value range I am considering. The series expansion approach by Neil Coffey seems to be more suitable in "my" value range, although it is too rapidly diverging with larger negative x, especially when using "only" 8 multiplications.

推荐答案

在情况下,有人想复制的问题所示的相对误差函数,这里是用Matlab的方式(快的指数是不是在Matlab非常快,但它是正确的):

In case anyone wants to replicate the relative error function shown in the question, here's a way using Matlab (the "fast" exponent is not very fast in Matlab, but it is accurate):

t = 1072632447+[0:ceil(1512775*pi)];
x = (t - 1072632447)/1512775;
ex = exp(x);
t = uint64(t);
import java.lang.Double;
et = arrayfun( @(n) java.lang.Double.longBitsToDouble(bitshift(n,32)), t );
plot(x, et./ex);

现在,错误的时间正好恰逢当的二进制值TMP 从尾数到指数溢出。且让我们打破数据转换成箱被丢弃成为指数位(使其定期报告),并只保留高8其余位(让我们的查找表,一个合理的规模):

Now, the period of the error exactly coincides with when the binary value of tmp overflows from the mantissa into the exponent. Let's break our data into bins by discarding the bits that become the exponent (making it periodic), and keeping only the high eight remaining bits (to make our lookup table a reasonable size):

index = bitshift(bitand(t,uint64(2^20-2^12)),-12) + 1;

现在我们计算平均需要进行调整:

Now we calculate the mean required adjustment:

relerrfix = ex./et;
adjust = NaN(1,256);
for i=1:256; adjust(i) = mean(relerrfix(index == i)); end;
et2 = et .* adjust(index);



相对误差降低到+/- 0.0006。当然,其他表的尺寸也是可能的(例如,具有64个条目的6位表给出+/- 0.0025)和误差在表的大小几乎是线性的。表项之间的线性插值会还进一步提高了错误,但在牺牲性能为代价。既然我们已经达到了精度的目标,让我们避免任何进一步的性能命中。

The relative error is decreased to +/- .0006. Of course, other tables sizes are possible as well (for example, a 6-bit table with 64 entries gives +/- .0025) and the error is almost linear in table size. Linear interpolation between table entries would improve the error yet further, but at the expense of performance. Since we've already met the accuracy goal, let's avoid any further performance hits.

在这一点上是一些琐碎的编辑技巧,采取​​利用MATLAB计算出的值,并创建一个在C#中查找表。对于每一个计算,我们添加一个位掩码,查找,精度和双精度乘法。

At this point it's some trivial editor skills to take the values computed by MatLab and create a lookup table in C#. For each computation, we add a bitmask, table lookup, and double-precision multiply.

static double FastExp(double x)
{
    var tmp = (long)(1512775 * x + 1072632447);
    int index = (int)(tmp >> 12) & 0xFF;
    return BitConverter.Int64BitsToDouble(tmp << 32) * ExpAdjustment[index];
}



该增速是非常相似的原代码 - 我的电脑,这约30%的速度编译为x86和3倍左右的速度为64。随着ideone单,这是一个庞大的亏损(但这样是)。

完整的源代码和测试用例: http://ideone.com/UwNgx

Complete source code and testcase: http://ideone.com/UwNgx

using System;
using System.Diagnostics;

namespace fastexponent
{
    class Program
    {
        static double[] ExpAdjustment = new double[256] {
            1.040389835,
            1.039159306,
            1.037945888,
            1.036749401,
            1.035569671,
            1.034406528,
            1.033259801,
            1.032129324,
            1.031014933,
            1.029916467,
            1.028833767,
            1.027766676,
            1.02671504,
            1.025678708,
            1.02465753,
            1.023651359,
            1.022660049,
            1.021683458,
            1.020721446,
            1.019773873,
            1.018840604,
            1.017921503,
            1.017016438,
            1.016125279,
            1.015247897,
            1.014384165,
            1.013533958,
            1.012697153,
            1.011873629,
            1.011063266,
            1.010265947,
            1.009481555,
            1.008709975,
            1.007951096,
            1.007204805,
            1.006470993,
            1.005749552,
            1.005040376,
            1.004343358,
            1.003658397,
            1.002985389,
            1.002324233,
            1.001674831,
            1.001037085,
            1.000410897,
            0.999796173,
            0.999192819,
            0.998600742,
            0.998019851,
            0.997450055,
            0.996891266,
            0.996343396,
            0.995806358,
            0.995280068,
            0.99476444,
            0.994259393,
            0.993764844,
            0.993280711,
            0.992806917,
            0.992343381,
            0.991890026,
            0.991446776,
            0.991013555,
            0.990590289,
            0.990176903,
            0.989773325,
            0.989379484,
            0.988995309,
            0.988620729,
            0.988255677,
            0.987900083,
            0.987553882,
            0.987217006,
            0.98688939,
            0.98657097,
            0.986261682,
            0.985961463,
            0.985670251,
            0.985387985,
            0.985114604,
            0.984850048,
            0.984594259,
            0.984347178,
            0.984108748,
            0.983878911,
            0.983657613,
            0.983444797,
            0.983240409,
            0.983044394,
            0.982856701,
            0.982677276,
            0.982506066,
            0.982343022,
            0.982188091,
            0.982041225,
            0.981902373,
            0.981771487,
            0.981648519,
            0.981533421,
            0.981426146,
            0.981326648,
            0.98123488,
            0.981150798,
            0.981074356,
            0.981005511,
            0.980944219,
            0.980890437,
            0.980844122,
            0.980805232,
            0.980773726,
            0.980749562,
            0.9807327,
            0.9807231,
            0.980720722,
            0.980725528,
            0.980737478,
            0.980756534,
            0.98078266,
            0.980815817,
            0.980855968,
            0.980903079,
            0.980955475,
            0.981017942,
            0.981085714,
            0.981160303,
            0.981241675,
            0.981329796,
            0.981424634,
            0.981526154,
            0.981634325,
            0.981749114,
            0.981870489,
            0.981998419,
            0.982132873,
            0.98227382,
            0.982421229,
            0.982575072,
            0.982735318,
            0.982901937,
            0.983074902,
            0.983254183,
            0.983439752,
            0.983631582,
            0.983829644,
            0.984033912,
            0.984244358,
            0.984460956,
            0.984683681,
            0.984912505,
            0.985147403,
            0.985388349,
            0.98563532,
            0.98588829,
            0.986147234,
            0.986412128,
            0.986682949,
            0.986959673,
            0.987242277,
            0.987530737,
            0.987825031,
            0.988125136,
            0.98843103,
            0.988742691,
            0.989060098,
            0.989383229,
            0.989712063,
            0.990046579,
            0.990386756,
            0.990732574,
            0.991084012,
            0.991441052,
            0.991803672,
            0.992171854,
            0.992545578,
            0.992924825,
            0.993309578,
            0.993699816,
            0.994095522,
            0.994496677,
            0.994903265,
            0.995315266,
            0.995732665,
            0.996155442,
            0.996583582,
            0.997017068,
            0.997455883,
            0.99790001,
            0.998349434,
            0.998804138,
            0.999264107,
            0.999729325,
            1.000199776,
            1.000675446,
            1.001156319,
            1.001642381,
            1.002133617,
            1.002630011,
            1.003131551,
            1.003638222,
            1.00415001,
            1.004666901,
            1.005188881,
            1.005715938,
            1.006248058,
            1.006785227,
            1.007327434,
            1.007874665,
            1.008426907,
            1.008984149,
            1.009546377,
            1.010113581,
            1.010685747,
            1.011262865,
            1.011844922,
            1.012431907,
            1.013023808,
            1.013620615,
            1.014222317,
            1.014828902,
            1.01544036,
            1.016056681,
            1.016677853,
            1.017303866,
            1.017934711,
            1.018570378,
            1.019210855,
            1.019856135,
            1.020506206,
            1.02116106,
            1.021820687,
            1.022485078,
            1.023154224,
            1.023828116,
            1.024506745,
            1.025190103,
            1.02587818,
            1.026570969,
            1.027268461,
            1.027970647,
            1.02867752,
            1.029389072,
            1.030114973,
            1.030826088,
            1.03155163,
            1.032281819,
            1.03301665,
            1.033756114,
            1.034500204,
            1.035248913,
            1.036002235,
            1.036760162,
            1.037522688,
            1.038289806,
            1.039061509,
            1.039837792,
            1.040618648
        };

        static double FastExp(double x)
        {
            var tmp = (long)(1512775 * x + 1072632447);
            int index = (int)(tmp >> 12) & 0xFF;
            return BitConverter.Int64BitsToDouble(tmp << 32) * ExpAdjustment[index];
        }

        static void Main(string[] args)
        {
            double[] x = new double[1000000];
            double[] ex = new double[x.Length];
            double[] fx = new double[x.Length];
            Random r = new Random();
            for (int i = 0; i < x.Length; ++i)
                x[i] = r.NextDouble() * 40;

            Stopwatch sw = new Stopwatch();
            sw.Start();
            for (int j = 0; j < x.Length; ++j)
                ex[j] = Math.Exp(x[j]);
            sw.Stop();
            double builtin = sw.Elapsed.TotalMilliseconds;
            sw.Reset();
            sw.Start();
            for (int k = 0; k < x.Length; ++k)
                fx[k] = FastExp(x[k]);
            sw.Stop();
            double custom = sw.Elapsed.TotalMilliseconds;

            double min = 1, max = 1;
            for (int m = 0; m < x.Length; ++m) {
                double ratio = fx[m] / ex[m];
                if (min > ratio) min = ratio;
                if (max < ratio) max = ratio;
            }

            Console.WriteLine("minimum ratio = " + min.ToString() + ", maximum ratio = " + max.ToString() + ", speedup = " + (builtin / custom).ToString());
         }
    }
}

这篇关于快速精通计算:可提高精度,而不会失去太多的表现?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆