从双precision争论开始的80位扩展precision计算性能 [英] Properties of 80-bit extended precision computations starting from double precision arguments

查看：165 发布时间：2016/8/18 13:34:39 c floating-point ieee-754 extended-precision

本文介绍了从双precision争论开始的80位扩展precision计算性能的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

下面是插值两种实现方法。参数 U1 总是介于0。和 1 。

Here are two implementations of interpolation functions. Argument u1 is always between 0. and 1..

#include <stdio.h>

double interpol_64(double u1, double u2, double u3)
{ 
  return u2 * (1.0 - u1) + u1 * u3;  
}

double interpol_80(double u1, double u2, double u3)
{ 
  return u2 * (1.0 - (long double)u1) + u1 * (long double)u3;  
}

int main()
{
  double y64,y80,u1,u2,u3;
  u1 = 0.025;
  u2 = 0.195;
  u3 = 0.195;
  y64 = interpol_64(u1, u2, u3);
  y80 = interpol_80(u1, u2, u3);
  printf("u2: %a\ny64:%a\ny80:%a\n", u2, y64, y80);
}

在严格的IEEE 754平台，80位长双 S，在 interpol_64所有计算（）是根据IEEE做754双precision，并在 interpol_80（） 80位扩展precision。
该程序打印：

On a strict IEEE 754 platform with 80-bit long doubles, all computations in interpol_64() are done according to IEEE 754 double precision, and in interpol_80() in 80-bit extended precision. The program prints:

u2: 0x1.8f5c28f5c28f6p-3
y64:0x1.8f5c28f5c28f5p-3
y80:0x1.8f5c28f5c28f6p-3

我感兴趣的财产的函数返回的结果总是在两者之间 U2 和 U3 。此属性是假的 interpol_64（），如所示的值的main（）以上。

I am interested in the property "the result returned by the function is always in-between u2 and u3". This property is false of interpol_64(), as shown by the values in the main() above.

属性是否有机会成为 interpol_80真（）？如果不是，什么是一个反例？它是否有助于如果我们知道 U2！= U3 或它们之间的最小距离？有没有确定在哪个属性将被保证是真正的中间计算一个尾数宽度的方法？

Does the property have a chance to be true of interpol_80()? If it isn't, what is a counter-example? Does it help if we know that u2 != u3 or that there is a minimum distance between them? Is there a method to determine a significand width for intermediate computations at which the property would be guaranteed to be true?

编辑：在所有我试过的随机值后，当中间计算在扩展precision做了内部持有的财产。如果 interpol_80（）把长双参数，这将是比较容易建立一个反例，但问题这里是专门关于需要双击参数的函数。这使得它更难建立一个反例，如果存在的话。

on all the random values I tried, the property held when intermediate computations were done in extended precision internally. If interpol_80() took long double arguments, it would be relatively easy to build a counter-example, but the question here is specifically about a function that takes double arguments. This makes it much harder to build a counter-example, if there is one.

请注意：编译器生成的x87指令可能产生同样的code为 interpol_64（）和 interpol_80（），但这是切我的问题。

Note: a compiler generating x87 instructions may generate the same code for interpol_64() and interpol_80(), but this is tangential to my question.

推荐答案

是，interpol_80（）是安全的，让我们来证明它。

Yes, interpol_80() is safe, let's demonstrate it.

问题指出，输入64位浮动

The problem states that inputs are 64bits float

rnd64(ui) = ui

其结果是精确地（假设*和+为数学运算）

The result is exactly (assuming * and + are mathematical operations)

r = u2*(1-u1)+(u1*u3)

四舍五入为64位浮点优化返回值

Optimal return value rounded to 64 bit float is

r64 = rnd64(r)

由于我们拥有这些特性

As we have these properties

u2 <= r <= u3

这是保证

rnd64(u2) <= rnd64(r) <= rnd64(u3)
u2 <= r64 <= u3

转换到U1，U2，U3的80bits是准确的了。结果

Conversion to 80bits of u1,u2,u3 are exact too.

rnd80(ui)=ui

现在，让我们假设 0℃= U2＆LT; = U3 ，然后用不精确浮点运算执行导致至多4舍入误差：

Now, let's assume 0 <= u2 <= u3, then performing with inexact float operations leads to at most 4 rounding errors:

rf = rnd(rnd(u2*rnd(1-u1)) + rnd(u1*u3))

假设轮至最近的偶数，这将是最多2 ULP了精确的数值。
如果舍入与64位浮点型或80位执行浮动：

Assuming round to nearest even, this will be at most 2 ULP off exact value. If rounding is performed with 64 bits float or 80 bits floats:

r - 2 ulp64(r) <= rf64 <= r + 2 ulp64(r)
r - 2 ulp80(r) <= rf80 <= r + 2 ulp80(r)

rf64 可通过2 ULP被关闭，因此国际刑警-64（）是不安全的，但对于 rnd64（rf80）？结果
我们可以告诉大家：

rf64 can be off by 2 ulp so interpol-64() is unsafe, but what about rnd64( rf80 )?
We can tell that:

rnd64(r - 2 ulp80(r)) <= rnd64(rf80) <= rnd64(r + 2 ulp80(r))

由于 0℃= U2＆LT; = U3 ，然后

ulp80(u2) <= ulp80(r) <= ulp80(r3)
rnd64(u2 - 2 ulp80(u2)) <= rnd64(r - 2 ulp80(r)) <= rnd64(rf80)
rnd64(u3 + 2 ulp80(u3)) >= rnd64(r + 2 ulp80(r)) >= rnd64(rf80)

幸运的是，就像在范围内的每个数（U2-ulp64（U2）/ 2，U2 + ulp64（U2）/ 2）我们得到

rnd64(u2 - 2 ulp80(u2)) = u2
rnd64(u3 + 2 ulp80(u3)) = u3

因为 ulp80（X）= ulp62（X）/ 2 ^（64-53）

因此，我们得到证明。

We thus get the proof

u2 <= rnd64(rf80) <= u3

有关U2＆LT; = U3＆LT; = 0，我们可以轻松地将相同的证明。

For u2 <= u3 <= 0, we can apply same proof easily.

被研究的最后一种情况是U2＆LT; = 0℃= U3。如果我们减去2大的值，那么结果可能高达ULP（大）/ 2关，而不是ULP（大大）/ 2 ......结果
因此，这种说法我们做不成立了：

The last case to be studied is u2 <= 0 <= u3. If we subtract 2 big values, then result can be up to ulp(big)/2 off rather than ulp(big-big)/2...
Thus this assertion we made doesn't hold anymore:

r - 2 ulp64(r) <= rf64 <= r + 2 ulp64(r)

幸运的是， U2＆LT; = U2 *（1-U1）LT; = 0℃= U1 * U3＆LT; = U3 ，这是preserved四舍五入后

Fortunately, u2 <= u2*(1-u1) <= 0 <= u1*u3 <= u3 and this is preserved after rounding

u2 <= rnd(u2*rnd(1-u1)) <= 0 <= rnd(u1*u3) <= u3

因此由于添加量是相反符号的

Thus since added quantities are of opposite sign:

u2 <= rnd(u2*rnd(1-u1)) + rnd(u1*u3) <= u3

四舍五入后

也一样，所以我们可以再一次保证

same goes after rounding, so we can once again guaranty

u2 <= rnd64( rf80 ) <= u3

QED

要完成，我们要关心的非正规输入（逐渐下溢），但我希望你不会有压力测试的凶狠。我不会表现出与那些会发生什么......

To be complete we should care of denormal inputs (gradual underflow), but I hope you won't be that vicious with stress tests. I won't demonstrate what happens with those...

修改

下面是一个后续的以下断言有点近似，并产生了一些意见时，0℃= U2＆LT; = U3

Here is a follow-up as the following assertion was a bit approximative and generated some comments when 0 <= u2 <= u3

r - 2 ulp80(r) <= rf80 <= r + 2 ulp80(r)

我们可以写出下面的不等式：

We can write the following inequalities:

rnd(1-u1) <= 1
rnd(1-u1) <= 1-u1+ulp(1)/4
u2*rnd(1-u1) <= u2 <= r
u2*rnd(1-u1) <= u2*(1-u1)+u2*ulp(1)/4
u2*ulp(1) < 2*ulp(u2) <= 2*ulp(r)
u2*rnd(1-u1) < u2*(1-u1)+ulp(r)/2

有关下一个舍入操作，我们用

For next rounding operation, we use

ulp(u2*rnd(1-u1)) <= ulp(r)
rnd(u2*rnd(1-u1)) < u2*(1-u1)+ulp(r)/2 + ulp(u2*rnd(1-u1))/2
rnd(u2*rnd(1-u1)) < u2*(1-u1)+ulp(r)/2 + ulp(r)/2
rnd(u2*rnd(1-u1)) < u2*(1-u1)+ulp(r)

有关总和的第二部分，我们有：

For second part of the sum, we have:

u1*u3 <= r
rnd(u1*u3) <= u1*u3 + ulp(u1*u3)/2
rnd(u1*u3) <= u1*u3 + ulp(r)/2

rnd(u2*rnd(1-u1))+rnd(u1*u3) < u2*(1-u1)+u1*u3 + 3*ulp(r)/2
rnd(rnd(u2*rnd(1-u1))+rnd(u1*u3)) < r + 3*ulp(r)/2 + ulp(r+3*ulp(r)/2)/2
ulp(r+3*ulp(r)/2) <= 2*ulp(r)
rnd(rnd(u2*rnd(1-u1))+rnd(u1*u3)) < r + 5*ulp(r)/2

我没有证明原件索赔，但不会太远......

I didn't prove the original claim, but not that far...

这篇关于从双precision争论开始的80位扩展precision计算性能的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

从双precision争论开始的80位扩展precision计算性能 [英] Properties of 80-bit extended precision computations starting from double precision arguments

问题描述

推荐答案

相关文章

C/C++最新文章

热门教程

热门工具

登录关闭

从双precision争论开始的80位扩展precision计算性能 [英] Properties of 80-bit extended precision computations starting from double precision arguments

问题描述

推荐答案

相关文章

C/C++最新文章

热门教程

热门工具

登录 关闭

登录关闭