使用成对求和,我需要多少项才能得出明显错误的结果? [英] With pairwise summation, how many terms do I need to get an appreciably wrong result?

查看:133
本文介绍了使用成对求和,我需要多少项才能得出明显错误的结果?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用给定种类的fp数(例如float16),直接构造具有完全错误结果的和.例如,使用python/numpy:

Using a given species of fp numbers, say float16, it is straight forward to construct sums with totally wrong results. For example, using python/numpy:

import numpy as np

one = np.float16(1)
ope = np.nextafter(one,one+one)

np.array((ope,one,-one,-one)).cumsum()
# array([1.001, 2.   , 1.   , 0.   ], dtype=float16)

在这里,我们已使用cumsum强制进行幼稚求和.留给自己的设备numpy将使用不同的求和顺序,从而产生更好的答案:

Here we have used cumsum to force naive summation. Left to its own devices numpy would have used a different order of summation, yielding a better answer:

np.array((ope,one,-one,-one)).sum()
# 0.000977

以上内容基于取消.为了排除此类示例,让我们仅允许使用非否定术语.对于幼稚的求和,给出带有非常错误的求和的例子仍然很容易.以下求和10 ^ 4个相同的项,每个项等于10 ^ -4:

The above is based on cancellation. To rule out this class of examples, let us only allow non negative terms. For naive summation it is still easy to give examples with very wrong sums. The following sums 10^4 identical terms each equal to 10^-4:

np.full(10**4,10**-4,np.float16).cumsum()
# array([1.0e-04, 2.0e-04, 3.0e-04, ..., 2.5e-01, 2.5e-01, 2.5e-01],
  dtype=float16)

最后一个词相差4倍.

同样,允许numpy使用成对求和会产生更好的结果:

Again, allowing numpy to use pairwise summation gives a much better result:

np.full(10**4,10**-4,np.float16).sum()
# 1.0

可以构造超过成对求和的和.选择分辨率低于1的eps时,我们可以使用1,eps,0,eps,3x0,eps,7x0,eps,15x0,eps,...,但这涉及到疯狂的术语.

It is possible to construct sums that beat pairwise summation. Choosing eps below resolution at 1 we can use 1, eps, 0, eps, 3x0, eps, 7x0, eps, 15x0, eps, ..., but this involves an insane number of terms.

我的问题:使用float16和仅使用非否定项,从成对求和中获得至少相差2倍的结果需要多少个项.

My question: Using float16 and only non negative terms, how many terms are required to obtain from pairwise summation a result that is off by at least a factor of 2.

奖金:相同的问题是积极"而不是非否定".甚至有可能吗?

Bonus: Same question with "positive" instead of "non negative". Is it even possible?

推荐答案

深度1432(因此2 ^ 1432项)足以使真实总和比计算出的总和大2倍.

Depth 1432 (so 2^1432 terms) suffices for the true sum to exceed the computed sum by a factor of two.

我对如何确定所需的术语数少于两倍的想法有所了解.

I had an idea for how to determine the number of terms needed to less than a factor of two.

我们使用动态编程来回答以下问题:给定深度d和目标浮点和s,具有成对和s2^d非负float16 s的最大真实和是多少? ?

We use dynamic programming to answer the following question: given a depth d and a target floating point sum s, what is the largest true sum of 2^d nonnegative float16s with pairwise sum s?

让该数量为T(d, s).我们复发了

Let that quantity be T(d, s). We get a recurrence

T(0, s) = s,    for all s.
T(d, s) =            max            (T(d-1, a) + T(d-1, b)),    for all d, s.
          a, b : float16(a + b) = s

每个重复步骤将涉及循环遍历2^29个组合(因为我们可以假设a ≤ b,并且负浮点数和特殊值均超出限制),并且所需的深度不会超过10^4左右由汉斯和你的答案.对我来说似乎可行.

Each step of the recurrence will involve looping over about 2^29 combinations (since we can assume a ≤ b, and negative floats and special values are off limits), and the depth required won't exceed 10^4 or so by Hans's and your answer. Seems feasible to me.

DP代码:

#include <algorithm>
#include <cstdio>
#include <vector>

using Float16 = int;
using Fixed = unsigned long long;

static constexpr int kExponentBits = 5;
static constexpr int kFractionBits = 10;
static constexpr Float16 kInfinity = ((1 << kExponentBits) - 1)
                                     << kFractionBits;

Fixed FixedFromFloat16(Float16 a) {
  int exponent = a >> kFractionBits;
  if (exponent == 0) {
    return a;
  }
  Float16 fraction = a - (exponent << kFractionBits);
  Float16 significand = (1 << kFractionBits) + fraction;
  return static_cast<Fixed>(significand) << (exponent - 1);
}

bool Plus(Float16 a, Float16 b, Float16* c) {
  Fixed exact_sum = FixedFromFloat16(a) + FixedFromFloat16(b);
  int exponent = 64 - kFractionBits - __builtin_clzll(exact_sum);
  if (exponent <= 0) {
    *c = static_cast<Float16>(exact_sum);
    return true;
  }
  Fixed ulp = Fixed{1} << (exponent - 1);
  Fixed remainder = exact_sum & (ulp - 1);
  Fixed rounded_sum = exact_sum - remainder;
  if (2 * remainder > ulp ||
      (2 * remainder == ulp && (rounded_sum & ulp) != 0)) {
    rounded_sum += ulp;
  }
  exponent = 64 - kFractionBits - __builtin_clzll(rounded_sum);
  if (exponent >= (1 << kExponentBits) - 1) {
    return false;
  }
  Float16 significand = rounded_sum >> (exponent - 1);
  Float16 fraction = significand - (Float16{1} << kFractionBits);
  *c = (exponent << kFractionBits) + fraction;
  return true;
}

int main() {
  std::vector<Fixed> greatest0(kInfinity);
  for (Float16 a = 0; a < kInfinity; a++) {
    greatest0[a] = FixedFromFloat16(a);
  }
  for (int depth = 1; true; depth++) {
    auto greatest1 = greatest0;
    for (Float16 a = 1; a < kInfinity; a++) {
      Fixed greatest0_a = greatest0[a];
      for (Float16 b = a; b < kInfinity; b++) {
        Float16 c;
        if (!Plus(a, b, &c)) {
          continue;
        }
        Fixed& value = greatest1[c];
        value = std::max(value, greatest0_a + greatest0[b]);
      }
    }

    std::vector<double> ratios;
    ratios.reserve(kInfinity - 1);
    for (Float16 a = 1; a < kInfinity; a++) {
      ratios.push_back(greatest1[a] / static_cast<double>(FixedFromFloat16(a)));
    }
    std::printf("depth %d, ratio = %.17g\n", depth,
                *std::max_element(ratios.begin(), ratios.end()));
    greatest0.swap(greatest1);
  }
}

我将运行它并在完成后发布更新.

I'll run this and post an update when it's done.

这篇关于使用成对求和,我需要多少项才能得出明显错误的结果?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆