是否可以通过i387 fsqrt指令获得正确的舍入? [英] Is there any way to get correct rounding with the i387 fsqrt instruction?

查看:80
本文介绍了是否可以通过i387 fsqrt指令获得正确的舍入?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有什么方法可以通过i387 fsqrt指令获得正确的舍入吗?...

... 除了更改x87控制字中的精确度模式-我知道这是可能的,但是这不是一个合理的解决方案,因为它存在令人讨厌的重入类型问题,其中精确度模式将是如果sqrt操作被中断,则错误.

我要处理的问题如下:x87 fsqrt 操作码以fpu寄存器的精度执行正确取整(根据IEEE 754)的平方根运算.假设扩展了(80位)精度.但是,我想用它来实现有效的单精度和双精度平方根函数,并正确舍入结果(按当前舍入模式).由于结果的精度过高,因此将结果再次转换为单精度或双精度舍入的第二步,可能会留下未正确舍入的结果.

通过某些操作,可以在偏见下解决此问题.例如,我可以通过以2的幂的形式添加一个偏置来避免加法结果中的过高精度,该偏置将双精度值的52个有效位强加到63位扩展精度尾数的最后52位中.但是我看不出有任何明显的方法可以用平方根完成这种技巧.

有什么聪明的主意吗?

(也标记为C,因为目标应用程序是C sqrt sqrtf 函数的实现.)

解决方案

首先,让我们显而易见:您应该使用SSE而不是x87.SSE sqrtss sqrtsd 指令完全可以满足您的要求,并且在所有现代x86系统上均受支持,并且速度也明显更快.

现在,如果您坚持使用x87,那么我将从好消息开始:您无需为浮动进行任何操作.您需要 2p + 2 位以p位浮点格式计算正确舍入的平方根.因为 80>2 * 24 + 2 ,对单精度的附加舍入将始终正确舍入,并且您具有正确舍入的平方根.

现在坏消息是: 80<2 * 53 + 2 ,所以双精度没有这种运气.我可以提出几种解决方法;这是我头顶上的一个简单轻松的东西.

  1. y = round_to_double(x87_square_root(x));
  2. 使用Dekker(头尾)乘积来计算 a b ,以便精确地 y * y = a + b .
  3. 计算残差 r = x-a-b .
  4. 如果(r == 0)返回y
  5. if(r> 0),让 y1 = y + 1 ulp ,然后计算 a1 b1 st y1 * y1 = a1 + b1 .比较 r1 = x-a1-b1 r ,并返回 y y1 ,具体取决于哪个较小的残差(如果残差的大小相等,则为零).
  6. 如果(r< 0),对 y1 = y-1 ulp 做同样的事情.

此过程仅处理默认的舍入模式.但是,在定向舍入模式下,仅舍入到目标格式就可以解决问题.

Is there any way to get correct rounding with the i387 fsqrt instruction?...

...aside from changing the precision mode in the x87 control word - I know that's possible, but it's not a reasonable solution because it has nasty reentrancy-type issues where the precision mode will be wrong if the sqrt operation is interrupted.

The issue I'm dealing with is as follows: the x87 fsqrt opcode performs a correctly-rounded (per IEEE 754) square root operation in the precision of the fpu registers, which I'll assume is extended (80-bit) precision. However, I want to use it to implement efficient single and double precision square root functions with the results correctly rounded (per the current rounding mode). Since the result has excess precision, the second step of converting the result to single or double precision rounds again, possibly leaving a not-correctly-rounded result.

With some operations it's possible to work around this with biases. For instance, I can avoid excess precision in the results of addition by adding a bias in the form of a power of two that forces the 52 significant bits of a double precision value into the last 52 bits of the 63-bit extended-precision mantissa. But I don't see any obvious way to do such a trick with square root.

Any clever ideas?

(Also tagged C because the intended application is implementation of the C sqrt and sqrtf functions.)

解决方案

First, let's get the obvious out of the way: you should be using SSE instead of x87. The SSE sqrtss and sqrtsd instructions do exactly what you want, are supported on all modern x86 systems, and are significantly faster as well.

Now, if you insist on using x87, I'll start with the good news: you don't need to do anything for float. You need 2p + 2 bits to compute a correctly rounded square-root in a p-bit floating-point format. Because 80 > 2*24 + 2, the additional rounding to single-precision will always round correctly, and you have a correctly rounded square root.

Now the bad news: 80 < 2*53 + 2, so no such luck for double precision. I can suggest several workarounds; here's a nice easy one off the top of my head.

  1. let y = round_to_double(x87_square_root(x));
  2. use a Dekker (head-tail) product to compute a and b such that y*y = a + b exactly.
  3. compute the residual r = x - a - b.
  4. if (r == 0) return y
  5. if (r > 0), let y1 = y + 1 ulp, and compute a1, b1 s.t. y1*y1 = a1 + b1. Compare r1 = x - a1 - b1 to r, and return either y or y1, depending on which has the smaller residual (or the one with zero low-order bit, if the residuals are equal in magnitude).
  6. if (r < 0), do the same thing for y1 = y - 1 ulp.

This proceedure only handles the default rounding mode; however, in the directed rounding modes, simply rounding to the destination format does the right thing.

这篇关于是否可以通过i387 fsqrt指令获得正确的舍入?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆