如何解决这个问题浮点平方根算法 [英] How to fix this floating point square root algorithm

查看：150 发布时间：2015/11/30 21:51:27 algorithm matlab math floating-point computer-science

本文介绍了如何解决这个问题浮点平方根算法的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我试图计算IEEE-754 32位浮点的各种输入点平方根但对于一个特定的输入下面的算法基于牛顿迭代法不收敛，我想知道什么我可以做解决这一问题？对于我设计的平台，我有一个32位浮点加法/减法，乘法，除法和

有关输入0x7F7FFFFF（3.4028234663852886E38），该算法将不收敛到18446743523953729536.000000正确答案这个算法的回答给了18446743523953737728.000000。

我使用MATLAB来实现我的code之前，我在硬件中实现这一点。我只能用一个precision浮点值，（所以没有双打）。

  CLC;明确;关闭所有;

％输入
R =类型转换（UINT32（HEX2DEC（num2str（DEC2HEX（（（HEX2DEC（'7F7FFFFF'））））））），'单'）

％初步估计
OneOverRoot2 =单（1 / SQRT（2））;
Root2 =单（SQRT（2））;

％获取输入R的高低位
hexdata_high = BITAND（bitshift（HEX2DEC（num2hex（单（R）））， -  16），HEX2DEC（'FFFF'））;
hexdata_low = BITAND（HEX2DEC（num2hex（单（R））），HEX2DEC（'FFFF'））;

输入％变动指数为-1得到尾数
临时= BITAND（hexdata_high，HEX2DEC（'807F'））;
世博= bitshift（BITAND（hexdata_high，HEX2DEC（7F80））， -  7）;
hexdata_high = BITOR（温度，HEX2DEC（'3F00'））;
B =类型转换（UINT32（HEX2DEC（num2str（DEC2HEX（（（bitshift（hexdata_high，16）+ hexdata_low）））））），单）;

％如果指数为奇数...
如果（BITAND（世博会，1））
    ％pretend尾数[0.5 ... 1.0）乘以2作为世博会是奇数，
    ％，它现在拥有的价值[1.0 ... 2.0）
    ％估算的sqrt（尾数）为[1.0 ...的sqrt（2））
    ％IOW：线性映射（0.5 ... 1.0）至（1.0 ...的sqrt（2））
    尾数=（Root2  -  1.0）/（1.0  -  0.5）*（B  -  0.5）+ 1.0;
其他
    ％尾数是范围[0.5 ... 1.0）
    ％估算的sqrt（尾数），为[1 / SQRT（2）... 1.0）
    ％IOW：线性映射（0.5 ... 1.0）到（1 / SQRT（2）... 1.0）
    尾数=（1.0  -  OneOverRoot2）/（1.0  -  0.5）*（二 -  0.5）+ OneOverRoot2;
结束

新闻中心=尾数* 2 ^（bitshift（世博127，-1））;
S =新闻

％S =（S + R / S）/ 2方法
对于j = 1：6
    fprintf中（旗下％U％F％F \ N'，J，S，（S-的sqrt（R）））;
    S =单（（单（S）+单（单（R）/单（S））））/ 2;
    S =单（S）;
结束

goodaccuracy =（ABS（（单（S） - 单（开方（单（R））））））＆LT; 2 ^ -23
差=（ABS（（单（S） - 单（开方（单（R））））））

％获取十六进制输出
hexdata_high =（BITAND（bitshift（HEX2DEC（num2hex（单（S）））， -  16），HEX2DEC（'FFFF'）））;
hexdata_low =（BITAND（HEX2DEC（num2hex（单（S））），HEX2DEC（'FFFF'）））;
fprintf中（FLOAT：科技投入：％E \ t \ tCorrect：％E \ t \ TMY答案：％E \ N'，R，开方（R），S）;
fprintf中（输出十六进制=为0x％04X％04X \ N'，hexdata_high，hexdata_low）;
OUT = HEX2DEC（num2hex（单（S）））;

解决方案

我参加了一个重击，在此。这就是我想出了：

 浮动mysqrtf（浮动六）{
  如果（F＆LT; 0）返回0.0 / 0.0;
  如果（F = = 1.0F / 0.0）回归的f;
  如果（F = F！）返回F;

  //半屁股的1.0初始猜测。
  INT博览会;
  浮富= frexpf（F，放大器;世博会）;
  浮S = 1.0;
  如果（博览会暨放大器; 1）富* = 2，expo--;

  //这是唯一的情况下这是什么下文失败。
  如果（FOO == 0x0.ffffffp + 0）返回ldexpf（0x0.ffffffp + 0，博览会/ 2）;

  //做四牛顿迭代。
  的for（int i = 0;我4;;我++）{
   浮动差异= S * S-FOO;
    差异/ = S;
    秒 -  =差异/ 2;
  }

  //做最后一牛顿迭代，计算S * S-foo的完全吻合。
  浮财政= S＆GT; = 1？ 4096：2048;
  浮动市=（S +财政） - 财政; //高12位有效数字
  浮SLO = S  - 市; //尾数休息
  浮动差异=市*市 - 富; //通过sterbenz定理减法准确
  差异+ = 2 *市* SLO; //相反的迹象;具体由sterbenz定理
  差异+ = SLO * SLO;
  差异/ = S; //差异== FMA（S，S，-foo）/秒。
  秒 -  =差异/ 2;

  返回ldexpf（S，博览会/ 2）;
}

要分析的第一件事是在浮点运算公式（S * S-富）/ S 。如果取值是一个足够好的近似的sqrt（富），Sterbenz定理告诉我们，分子是一个ULP之内正确答案（富）---所有的错误是逼近误差的计算取值* S 。然后我们除以取值;这给了我们最坏的另一半ULP逼近误差。因此，即使没有乘加，差异是在1.5的它应该是什么ULP。我们除以二。

请注意，最初的猜测并不在其本身没关系，只要你按照它有足够的牛顿迭代。

测量的近似S中的误差由腹肌SQRT（富）（S - 富/秒）。 1我最初猜测的误差最大为1。牛顿迭代法在精确的算法平方的错误，并把它通过4.牛顿迭代的浮点运算---我做四次---平方的那种错误，将其划分了4，和踢在错误的另一个0.75 ULP。你这样做四次，你会发现你有一个最大相对误差 0x0.000000C4018384 ，约为0.77 ULP。这意味着四个牛顿迭代产生一个忠实全面的结果。

我做第五牛顿步得到一个正确的，圆形的平方根。为什么它的原因是多了几分复杂。

施持有取值，而 SLO 持有下半部分。最后12个比特中的每个有效位将为零。这意味着，特别是市*市和市* SLO 和 SLO * SLO 被完全重新presentable为浮动秒。



 取值* S 是富两个ulp范围内。 市*市是在2047 ULPS的取值* S 。因此，市*市 - 富是在2049 ULPS零;特别是，它的确切重新presentable和小于2 ^-10 

您可以检查您可以添加 2 *石* SLO ，并得到一个确切-RE presentable结果是在2 ^-22零，然后添加 SLO * SLO ，并得到一个完全重新presentable结果---  S * S-富精确计算。

当您除以取值，你在一个额外的半ULP错误的，这是踢在2 ^-48在这里，因为我们错误已经非常小。

现在我们做一个牛顿步。我们正确地计算出的电流误差在2 ^-46。添加一半给取值给我们的平方根在3 * 2 ^-48。

把它变成正确的舍入的保证，我们需要证明有没有浮动取值1/2和2之间，比我special-其他套管，其平方根是3 * 2 ^-48之内连续两次浮动秒之间的中点。你可以做一些错误的分析，得到了不定方程，发现所有的不定方程的解，发现他们中相对于投入，制定出什么样的算法做这些。 （如果你这样做，有一个物理的解决方案和一堆非物理的解决方案。唯一真正的解决办法是我特例的唯一的事。）可能有一个更清洁的方式，但是。
I am trying to compute the IEEE-754 32-bit Floating Point Square Root of various inputs but for one particular input the below algorithm based upon the Newton-Raphson method won't converge, I am wondering what I can do to fix the problem? For the platform I am designing I have a 32-bit floating point adder/subtracter, multiplier, and divider.

For input 0x7F7FFFFF (3.4028234663852886E38)., the algorithm won't converge to the correct answer of 18446743523953729536.000000 This algorithm's answer gives 18446743523953737728.000000.

I am using MATLAB to implement my code before I implement this in hardware. I can only use single precision floating point values, (SO NO DOUBLES).
clc; clear; close all;

% Input
R = typecast(uint32(hex2dec(num2str(dec2hex(((hex2dec('7F7FFFFF'))))))),'single')

% Initial estimate
OneOverRoot2 = single(1/sqrt(2));
Root2 = single(sqrt(2));

% Get low and high bits of input R
hexdata_high = bitand(bitshift(hex2dec(num2hex(single(R))),-16),hex2dec('ffff'));
hexdata_low = bitand(hex2dec(num2hex(single(R))),hex2dec('ffff'));

% Change exponent of input to -1 to get Mantissa
temp = bitand(hexdata_high,hex2dec('807F'));
Expo = bitshift(bitand(hexdata_high,hex2dec('7F80')),-7);
hexdata_high = bitor(temp,hex2dec('3F00'));
b = typecast(uint32(hex2dec(num2str(dec2hex(((bitshift(hexdata_high,16)+ hexdata_low)))))),'single');

% If exponent is odd ...
if (bitand(Expo,1))
    % Pretend the mantissa [0.5 ... 1.0) is multiplied by 2 as Expo is odd,
    %   so it now has the value [1.0 ... 2.0)
    % Estimate the sqrt(mantissa) as [1.0 ... sqrt(2))
    % IOW: linearly map (0.5 ... 1.0) to (1.0 ... sqrt(2))
    Mantissa = (Root2 - 1.0)/(1.0 - 0.5)*(b - 0.5) + 1.0;
else
    % The mantissa is in range [0.5 ... 1.0)
    % Estimate the sqrt(mantissa) as [1/sqrt(2) ... 1.0)
    % IOW: linearly map (0.5 ... 1.0) to (1/sqrt(2) ... 1.0)
    Mantissa = (1.0 - OneOverRoot2)/(1.0 - 0.5)*(b - 0.5) + OneOverRoot2;
end

newS = Mantissa*2^(bitshift(Expo-127,-1));
S=newS

% S = (S + R/S)/2 method
for j = 1:6 
    fprintf('S  %u %f %f\n', j, S, (S-sqrt(R)));
    S = single((single(S) + single(single(R)/single(S))))/2;
    S = single(S);
end

goodaccuracy =  (abs((single(S)-single(sqrt(single(R)))))) < 2^-23
difference = (abs((single(S)-single(sqrt(single(R))))))

% Get hexadecimal output
hexdata_high = (bitand(bitshift(hex2dec(num2hex(single(S))),-16),hex2dec('ffff')));
hexdata_low = (bitand(hex2dec(num2hex(single(S))),hex2dec('ffff')));
fprintf('FLOAT: T  Input: %e\t\tCorrect: %e\t\tMy answer: %e\n', R, sqrt(R), S);
fprintf('output hex = 0x%04X%04X\n',hexdata_high,hexdata_low);
out = hex2dec(num2hex(single(S)));

 解决方案 
I took a whack at this.  Here's what I came up with:
float mysqrtf(float f) {
  if (f < 0) return 0.0f/0.0f;
  if (f == 1.0f / 0.0f) return f;
  if (f != f) return f;

  // half-ass an initial guess of 1.0.
  int expo;
  float foo = frexpf(f, &expo);
  float s = 1.0;
  if (expo & 1) foo *= 2, expo--;

  // this is the only case for which what's below fails.
  if (foo == 0x0.ffffffp+0) return ldexpf(0x0.ffffffp+0, expo/2);

  // do four newton iterations.
  for (int i = 0; i < 4; i++) {
   float diff = s*s-foo;
    diff /= s;
    s -= diff/2;
  }

  // do one last newton iteration, computing s*s-foo exactly.
  float scal = s >= 1 ? 4096 : 2048;
  float shi = (s + scal) - scal; // high 12 bits of significand
  float slo = s - shi; // rest of significand
  float diff = shi * shi - foo; // subtraction exact by sterbenz's theorem
  diff += 2 * shi * slo; // opposite signs; exact by sterbenz's theorem
  diff += slo * slo;
  diff /= s; // diff == fma(s, s, -foo) / s.
  s -= diff/2;

  return ldexpf(s, expo/2);
}
The first thing to analyse is the formula (s*s-foo)/s in floating-point arithmetic.  If s is a sufficiently good approximation to sqrt(foo), Sterbenz's theorem tells us that the numerator is within an ulp(foo) of the right answer --- all of that error is approximation error from computing s*s.  Then we divide by s; this gives us at worst another half-ulp of approximation error.  So, even without a fused multiply-add, diff is within 1.5 ulp of what it should be.  And we divide it by two.

Notice that the initial guess doesn't in and of itself matter as long as you follow it up with enough Newton iterations.

Measure the error of an approximation s to sqrt(foo) by abs(s - foo/s).  The error of my initial guess of 1 is at most 1.  A Newton iteration in exact arithmetic squares the error and divides it by 4.  A Newton iteration in floating-point arithmetic --- the kind I do four times --- squares the error, divides it by 4, and kicks in another 0.75 ulp of error.  You do this four times and you find you have a relative error at most 0x0.000000C4018384, which is about 0.77 ulp.  This means that four Newton iterations yield a faithfully-rounded result.

I do a fifth Newton step to get a correctly-rounded square root.  The reason why it works is a little more intricate.

shi holds the "top half" of s while slo holds the "bottom half."  The last 12 bits in each significand will be zero.  This means, in particular, that shi * shi and shi * slo and slo * slo are exactly representable as floats.

s*s is within two ulps of foo.  shi*shi is within 2047 ulps of s*s.  Thus shi * shi - foo is within 2049 ulps of zero; in particular, it's exactly representable and less than 2^-10.

You can check that you can add 2 * shi * slo and get an exactly-representable result that's within 2^-22 of zero and then add slo*slo and get an exactly representable result --- s*s-foo computed exactly.

When you divide by s, you kick in an additional half-ulp of error, which is at most 2^-48 here since our error was already so small.

Now we do a Newton step.  We've computed the current error correctly to within 2^-46.  Adding half of it to s gives us the square root to within 3*2^-48.

To turn this into a guarantee of correct rounding, we need to prove that there are no floats between 1/2 and 2, other than the one I special-cased, whose square roots are within 3*2^-48 of a midpoint between two consecutive floats.  You can do some error analysis, get a Diophantine equation, find all of the solutions of that Diophantine equation, find which inputs they correspond to, and work out what the algorithm does on those.  (If you do this, there is one "physical" solution and a bunch of "unphysical" solutions.  The one real solution is the only thing I special-cased.)  There may be a cleaner way, however.

                        这篇关于如何解决这个问题浮点平方根算法的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

如何解决这个问题浮点平方根算法 [英] How to fix this floating point square root algorithm

问题描述

相关文章

C/C++最新文章

热门教程

热门工具

登录关闭

如何解决这个问题浮点平方根算法 [英] How to fix this floating point square root algorithm

问题描述

相关文章

C/C++最新文章

热门教程

热门工具

登录 关闭

登录关闭