快速平方根优化? [英] fast square root optimization?

查看:105
本文介绍了快速平方根优化?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果您查看此非常漂亮的页面:

If you check this very nice page:

http://www.codeproject. com/Articles/69941/最佳平方根方法-算法功能-Precisi

您将看到此程序:

#define SQRT_MAGIC_F 0x5f3759df 
 float  sqrt2(const float x)
{
  const float xhalf = 0.5f*x;

  union // get bits for floating value
  {
    float x;
    int i;
  } u;
  u.x = x;
  u.i = SQRT_MAGIC_F - (u.i >> 1);  // gives initial guess y0
  return x*u.x*(1.5f - xhalf*u.x*u.x);// Newton step, repeating increases accuracy 
}

我的问题是:为什么没有将其实现为以下任何特定原因:

My question is: Is there any particular reason why this isn't implemented as:

#define SQRT_MAGIC_F 0x5f3759df 
 float  sqrt2(const float x)
{

  union // get bits for floating value
  {
    float x;
    int i;
  } u;
  u.x = x;
  u.i = SQRT_MAGIC_F - (u.i >> 1);  // gives initial guess y0

  const float xux = x*u.x;

  return xux*(1.5f - .5f*xux*u.x);// Newton step, repeating increases accuracy 
}

从拆卸中,我看到的少了一个MUL.完全没有出现xhalf的目的吗?

As, from disassembly, I see one MUL less. Is there any purpose to having xhalf appear at all?

推荐答案

当乘数在最后一行链接在一起作为中间结果并保持不变时,使用80位寄存器的传统浮点数学可能更准确.在80位寄存器中.

It could be that legacy floating point math, which used 80 bit registers, was more accurate when the multipliers where linked together in the last line as intermediate results where kept in 80 bit registers.

上层实现中的第一个乘法与随后的整数数学并行发生,它们使用不同的执行资源. 另一方面,第二个函数看起来更快,但是由于上述原因,很难确定它是否真的是. 另外, const float xux = x * u.x; 语句会将结果减小回32位浮点数,这可能会降低整体精度.

The first multiplication in the upper implementation takes place in parallel to the integer math that follows, they use different execution resources. The second function on the other hand looks faster but it's hard to tell if it really is because of the above. Also, the const float xux = x*u.x; statement reduces the result back to 32 bit float, which may reduce overall accuracy.

您可以并排测试这些函数,并将它们与math.h中的 sqrt 函数进行比较(使用双精度而不是浮点型).通过这种方式,您可以查看哪个更快,哪个更准确.

You could test these functions head to head and compare them to the sqrt function in math.h (use double not float). This way you can see which is faster and which is more accurate.

这篇关于快速平方根优化?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆