如何使用mach_absolute_time而不溢出? [英] How can I use mach_absolute_time without overflowing?

查看:163
本文介绍了如何使用mach_absolute_time而不溢出?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在达尔文市,POSIX标准clock_gettime(CLOCK_MONOTONIC)计时器不可用.而是通过mach/mach_time.h中的mach_absolute_time函数获得最高分辨率的单调计时器.

On Darwin, the POSIX standard clock_gettime(CLOCK_MONOTONIC) timer is not available. Instead, the highest resolution monotonic timer is obtained through the mach_absolute_time function from mach/mach_time.h.

返回的结果可能是来自处理器的未经调整的滴答计数,在这种情况下,时间单位可能是奇数倍.例如,在时钟频率为33MHz的CPU上,达尔文返回1000000000/33333335作为返回结果的精确单位(即,将mach_absolute_time乘以该分数以获得纳秒级的值).

The result returned may be an unadjusted tick count from the processor, in which case the time units could be a strange multiple. For example, on a CPU with a 33MHz tick count, Darwin returns 1000000000/33333335 as the exact units of the returned result (ie, multiply the mach_absolute_time by that fraction to obtain a nanosecond value).

我们通常希望将精确的刻度转换为标准"(十进制)单位,但是不幸的是,天真的将绝对时间乘以小数,即使在64位算术中也会溢出.这是Apple在mach_absolute_time上的唯一文档的错误(技术问答(Q& A QA1398 ). 1

We usually wish to convert from exact ticks to "standard" (decimal) units, but unfortunately, naively multiplying the absolute time by the fraction will overflow even in 64-bit arithmetic. This is an error that Apple's sole piece of documentation on mach_absolute_time falls into (Technical Q&A QA1398).1

我应该如何编写正确使用mach_absolute_time的函数?

How should I write a function that correctly uses mach_absolute_time?

  1. 请注意,这不是一个理论问题:QA1398中的示例代码完全无法在基于PowerPC的Mac上运行.在Intel Macs上,mach_timebase_info始终返回1/1作为缩放因子,因为CPU的原始滴答计数不可靠(动态速度步进),因此API会为您进行缩放.在PowerPC Mac上,mach_timebase_info返回1000000000/33333335或1000000000/25000000,因此Apple提供的代码肯定每隔几分钟就会溢出.糟糕!
  1. Note that this is not a theoretical problem: the sample code in QA1398 completely fails to work on PowerPC-based Macs. On Intel Macs, mach_timebase_info always returns 1/1 as the scaling factor because the CPU's raw tick count is unreliable (dynamic speed-stepping), so the API does the scaling for you. On PowerPC Macs, mach_timebase_info returns either 1000000000/33333335 or 1000000000/25000000, so Apple's provided code definitely overflows every few minutes. Oops.

推荐答案

最精确(最佳)的答案

以128位精度执行算术运算,以避免溢出!

Most-precise (best) answer

Perform the arithmetic at 128-bit precision to avoid the overflow!

// Returns monotonic time in nanos, measured from the first time the function
// is called in the process.
uint64_t monotonicTimeNanos() {
  uint64_t now = mach_absolute_time();
  static struct Data {
    Data(uint64_t bias_) : bias(bias_) {
      kern_return_t mtiStatus = mach_timebase_info(&tb);
      assert(mtiStatus == KERN_SUCCESS);
    }
    uint64_t scale(uint64_t i) {
      return scaleHighPrecision(i - bias, tb.numer, tb.denom);
    }
    static uint64_t scaleHighPrecision(uint64_t i, uint32_t numer,
                                       uint32_t denom) {
      U64 high = (i >> 32) * numer;
      U64 low = (i & 0xffffffffull) * numer / denom;
      U64 highRem = ((high % denom) << 32) / denom;
      high /= denom;
      return (high << 32) + highRem + low;
    }
    mach_timebase_info_data_t tb;
    uint64_t bias;
  } data(now);
  return data.scale(now);
}

一个简单的低分辨率答案

// Returns monotonic time in nanos, measured from the first time the function
// is called in the process.  The clock may run up to 0.1% faster or slower
// than the "exact" tick count.
uint64_t monotonicTimeNanos() {
  uint64_t now = mach_absolute_time();
  static struct Data {
    Data(uint64_t bias_) : bias(bias_) {
      kern_return_t mtiStatus = mach_timebase_info(&tb);
      assert(mtiStatus == KERN_SUCCESS);
      if (tb.denom > 1024) {
        double frac = (double)tb.numer/tb.denom;
        tb.denom = 1024;
        tb.numer = tb.denom * frac + 0.5;
        assert(tb.numer > 0);
      }
    }
    mach_timebase_info_data_t tb;
    uint64_t bias;
  } data(now);
  return (now - data.bias) * data.tb.numer / data.tb.denom;
}

使用低精度算法但使用连续分数以避免精度损失的巧妙解决方案

// This function returns the rational number inside the given interval with
// the smallest denominator (and smallest numerator breaks ties; correctness
// proof neglects floating-point errors).
static mach_timebase_info_data_t bestFrac(double a, double b) {
  if (floor(a) < floor(b))
  { mach_timebase_info_data_t rv = {(int)ceil(a), 1}; return rv; }
  double m = floor(a);
  mach_timebase_info_data_t next = bestFrac(1/(b-m), 1/(a-m));
  mach_timebase_info_data_t rv = {(int)m*next.numer + next.denum, next.numer};
  return rv;
}

// Returns monotonic time in nanos, measured from the first time the function
// is called in the process.  The clock may run up to 0.1% faster or slower
// than the "exact" tick count. However, although the bound on the error is
// the same as for the pragmatic answer, the error is actually minimized over
// the given accuracy bound.
uint64_t monotonicTimeNanos() {
  uint64_t now = mach_absolute_time();
  static struct Data {
    Data(uint64_t bias_) : bias(bias_) {
      kern_return_t mtiStatus = mach_timebase_info(&tb);
      assert(mtiStatus == KERN_SUCCESS);
      double frac = (double)tb.numer/tb.denom;
      uint64_t spanTarget = 315360000000000000llu; // 10 years
      if (getExpressibleSpan(tb.numer, tb.denom) >= spanTarget)
        return;
      for (double errorTarget = 1/1024.0; errorTarget > 0.000001;) {
        mach_timebase_info_data_t newFrac =
            bestFrac((1-errorTarget)*frac, (1+errorTarget)*frac);
        if (getExpressibleSpan(newFrac.numer, newFrac.denom) < spanTarget)
          break;
        tb = newFrac;
        errorTarget = fabs((double)tb.numer/tb.denom - frac) / frac / 8;
      }
      assert(getExpressibleSpan(tb.numer, tb.denom) >= spanTarget);
    }
    mach_timebase_info_data_t tb;
    uint64_t bias;
  } data(now);
  return (now - data.bias) * data.tb.numer / data.tb.denom;
}

推导

我们的目标是将mach_timebase_info返回的分数减小到基本相同,但分母较小的分数.我们可以处理的时间跨度的大小仅受分母的大小限制,而不是我们乘以的分数分子的限制:

The derivation

We aim to reduce the fraction returned by mach_timebase_info to one that is essentially the same, but with a small denominator. The size of the timespan that we can handle is limited only by the size of the denominator, not the numerator of the fraction we shall multiply by:

uint64_t getExpressibleSpan(uint32_t numer, uint32_t denom) {
  // This is just less than the smallest thing we can multiply numer by without
  // overflowing. ceilLog2(numer) = 64 - number of leading zeros of numer
  uint64_t maxDiffWithoutOverflow = ((uint64_t)1 << (64 - ceilLog2(numer))) - 1;
  return maxDiffWithoutOverflow * numer / denom;
}

如果mach_timebase_info返回的denom=33333335,则仅在乘数溢出之前,我们最多可以处理18秒的差异.如getExpressibleSpan所示,通过为此计算大致的下限,numer的大小无关紧要:将numer减半将maxDiffWithoutOverflow加倍.因此,唯一的目标是产生一个分母较小的接近于数字/分母的分数.最简单的方法是使用连续分数.

If denom=33333335 as returned by mach_timebase_info, we can handle differences of up to 18 seconds only before the multiplication by numer overflows. As getExpressibleSpan shows, by calculating a rough lower bound for this, the size of numer doesn't matter: halving numer doubles maxDiffWithoutOverflow. The only goal therefore is to produce a fraction close to numer/denom that has a smaller denominator. The simplest method to do this is using continued fractions.

连续分数法非常方便.如果提供的间隔包含整数,则bestFrac显然可以正常工作:它返回间隔中大于1的最小整数.否则,它以严格更大的间隔递归调用自身并返回m+1/next.最终结果是一个连续分数,可以通过归纳法显示出它具有正确的属性:它是最佳的,即在给定区间内的分数最小的分数.

The continued fractions method is rather handy. bestFrac clearly works correctly if the provided interval contains an integer: it returns the least integer in the interval over 1. Otherwise, it calls itself recursively with a strictly larger interval and returns m+1/next. The final result is a continued fraction that can be shown by induction to have the correct property: it's optimal, the fraction inside the given interval with the least denominator.

最后,当将mach_absolute_time重新缩放为纳秒时,我们将达尔文传递给我们的分数减小为较小的分数.我们在这里可能会引入错误,因为我们通常无法在不损失准确性的情况下减少分数.我们为自己设定了0.1%误差的目标,并检查是否已将分数减小到足以正确处理常见时间段(最长十年)的水平.

Finally, we reduce the fraction Darwin passes us to a smaller one to use when rescaling the mach_absolute_time to nanoseconds. We may introduce an error here because we can't reduce the fraction in general without losing accuracy. We set ourselves the target of 0.1% error, and check that we've reduced the fraction enough for common timespans (up to ten years) to be handled correctly.

可以说该方法执行起来过于复杂,但是可以正确处理API可以抛出的任何内容,并且所得到的代码仍然很短而且非常快(bestFrac通常只在深三到四次迭代之前递归对于随机间隔[a,a*1.002]返回小于1000的分母.

Arguably the method is over-complicated for what it does, but it handles correctly anything the API can throw at it, and the resulting code is still short and extremely fast (bestFrac typically recurses only three or four iterations deep before returning a denominator less than 1000 for random intervals [a,a*1.002]).

这篇关于如何使用mach_absolute_time而不溢出?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆