采取日志和增加与乘法 [英] Taking logs and adding versus multiplying

查看:194
本文介绍了采取日志和增加与乘法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果我想要得到一个浮点数列表的乘积,那么通过添加它们的日志而损失的最坏情况/平均情况下的精度是多少,然后将这个总和的exp进行相乘,而不是将它们相乘。有没有这样的情况下,这实际上更精确?

解决方案

没有任何溢出或下溢shenanigans,如果 a b 是浮点数,那么产品 a * b 被计算为相对误差在1 / 2ulp以内。

N double s因此导致回答最多(1 - epsilon / 2) - N exp(epsilon N / 2)。我可以想象,在平均情况下,您可以预计到每平方英尺的偏差( N )。 (至第一个订单,这是关于N epsilon。)



然而指数溢出和下溢更可能发生在这个策略,你更可能得到infinities,零和NaN以及由于subnormals四舍五入的不精确的值。

另一种方法在这个意义上更强大,但在直接的方法不会导致上溢或下溢的情况下,速度会更慢,更糟糕。这里是一个非常非常粗略的分析标准双打的情况下,其中N至少是小于2 53的几个数量级:

你总是可以取一个有限的浮点数的对数,并得到一个有限的浮点数,所以我们在那里很酷。你可以直接加上 N 浮点数来得到 N epsilon最坏情况下的relative错误,sqrt (N)ε期望的相对误差,或者使用 Kahan summation 得到大约3个eps-情况相对的错误。恐慌报价是围绕相对,因为错误是相对于你总结的东西的绝对值的总和。注意,没有有限的 double 有一个绝对值大于710的对数。这意味着我们使用Kahan求和计算的对数的总和具有至多2130Nε的绝对误差。当我们用对数求和时,我们从正确答案中得到一个至多为exp的因子(2130 N epsilon)。

一个病理学的例子log-sum-exp方法:
$ b

int main(){
double foo [] = {0x1.000000000018cp1023,0x1.0000000000072p-1023};
double prod = 1;
double sumlogs = 0;
for(int i = 0; i< sizeof(foo)/ sizeof(* foo); i ++){
prod * = foo [i];
sumlogs + = log(foo [i]);
}
printf(%a%a\\\
,foo [0],foo [1]);
printf(%a%a%a \\\
,prod,exp(sumlogs),prod - exp(sumlogs));





在我的平台上,我得到了0x1.fep-44的区别。我确定有更糟糕的例子。


If I want to take the product of a list of floating point numbers, what's the worst-case/average-case precision lost by adding their logs and then taking exp of the sum as opposed to just multiplying them. Is there ever a case when this is actually more precise?

解决方案

Absent any overflow or underflow shenanigans, if a and b are floating-point numbers, then the product a*b will be computed to within a relative error of 1/2 ulp.

A crude bound on the relative error after multiplying a chain of N doubles therefore results in answer off by a factor of at most (1 - epsilon/2)-N, which is about exp(epsilon N/2). I'd imagine you can expect a deviation of around epsilon sqrt(N) in the average case. (To first order, this is about N epsilon.)

Exponent overflow and underflow are more likely to happen with this strategy, though; you're more likely to get infinities, zeroes, and NaNs as well as imprecise values due to rounding of subnormals.

The other approach is more robust in that sense, but it is much slower and errs worse in the case where the straightforward approach doesn't result in an overflow or underflow. Here's a very, very crude analysis for standard doubles in the case where N is at least a couple orders of magnitude smaller than 253:

You can always take the log of a finite floating-point number and get a finite floating-point number, so we're cool there. You can add up N floating-point numbers either straightforwardly to get N epsilon worst-case "relative" error and sqrt(N) epsilon expected "relative" error, or using Kahan summation to get about 3 epsilon worst-case "relative" error. Scare quotes are around "relative" because the error is relative to the sum of the absolute values of the things you're summing.

Notice that no finite double has a logarithm whose absolute value is bigger than 710 or so. That means our sum-of-logarithms computed using Kahan summation has an absolute error of at most 2130 N epsilon. When we exponentiate our sum-of-logarithms, we get something off by a factor of at most exp(2130 N epsilon) from the right answer.

A pathological examples for the log-sum-exp approach:

int main() {
  double foo[] = {0x1.000000000018cp1023, 0x1.0000000000072p-1023};
  double prod = 1;
  double sumlogs = 0;
  for (int i = 0; i < sizeof(foo) / sizeof(*foo); i++) {
    prod *= foo[i];
    sumlogs += log(foo[i]);
  }
  printf("%a %a\n", foo[0], foo[1]);
  printf("%a %a %a\n", prod, exp(sumlogs), prod - exp(sumlogs));
}

On my platform, I get a difference of 0x1.fep-44. I'm sure there are worse examples.

这篇关于采取日志和增加与乘法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆