如何有效地计算小于或等于给定数字的2的最高幂? [英] How to efficiently count the highest power of 2 that is less than or equal to a given number?

查看:42
本文介绍了如何有效地计算小于或等于给定数字的2的最高幂?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

到目前为止,我提出了三种解决方案:

I came up with three solutions so far:

效率极低的标准库 pow log2 函数:

The extremely inefficient standard library pow and log2 functions:

int_fast16_t powlog(uint_fast16_t n)
{
  return static_cast<uint_fast16_t>(pow(2, floor(log2(n))));
}

计算2的次幂的效率更高,直到我达到的数字超出了必须达到的水平:

Far more efficient counting subsequent powers of 2 until I reach a greater number than I had to reach:

uint_fast16_t multiply(uint_fast16_t n)
{
  uint_fast16_t maxpow = 1;
  while(2*maxpow <= n)
    maxpow *= 2;
  return maxpow;
}

到目前为止,最有效的bin搜索2的幂的预计算表:

The most efficient so far binsearching a precomputed table of powers of 2:

uint_fast16_t binsearch(uint_fast16_t n)
{
  static array<uint_fast16_t, 20> pows {1,2,4,8,16,32,64,128,256,512,
    1024,2048,4096,8192,16384,32768,65536,131072,262144,524288};

  return *(upper_bound(pows.begin(), pows.end(), n)-1);
}

这可以进一步优化吗?在这里可以使用任何技巧吗?

Can this be optimized even more? Any tricks that could be used here?

我使用的完全基准测试

#include <iostream>
#include <chrono>
#include <cmath>
#include <cstdint>
#include <array>
#include <algorithm>
using namespace std;
using namespace chrono;

uint_fast16_t powlog(uint_fast16_t n)
{
  return static_cast<uint_fast16_t>(pow(2, floor(log2(n))));
}

uint_fast16_t multiply(uint_fast16_t n)
{
  uint_fast16_t maxpow = 1;
  while(2*maxpow <= n)
    maxpow *= 2;
  return maxpow;
}

uint_fast16_t binsearch(uint_fast16_t n)
{
  static array<uint_fast16_t, 20> pows {1,2,4,8,16,32,64,128,256,512,
    1024,2048,4096,8192,16384,32768,65536,131072,262144,524288};

  return *(upper_bound(pows.begin(), pows.end(), n)-1);
}

high_resolution_clock::duration test(uint_fast16_t(powfunct)(uint_fast16_t))
{
  auto tbegin = high_resolution_clock::now();
  volatile uint_fast16_t sink;
  for(uint_fast8_t i = 0; i < UINT8_MAX; ++i)
    for(uint_fast16_t n = 1; n <= 999999; ++n)
      sink = powfunct(n);
  auto tend = high_resolution_clock::now();
  return tend - tbegin;
}

int main()
{
  cout << "Pow and log took " << duration_cast<milliseconds>(test(powlog)).count() << " milliseconds." << endl;
  cout << "Multiplying by 2 took " << duration_cast<milliseconds>(test(multiply)).count() << " milliseconds." << endl;
  cout << "Binsearching precomputed table of powers took " << duration_cast<milliseconds>(test(binsearch)).count() << " milliseconds." << endl;
}

使用 -O2 进行编译,这在我的笔记本电脑上给出了以下结果:

Compiled with -O2 this gave the following results on my laptop:

Pow and log took 19294 milliseconds.
Multiplying by 2 took 2756 milliseconds.
Binsearching precomputed table of powers took 2278 milliseconds.

推荐答案

注释中已经建议了带有内在函数的版本,因此这里是一个不依赖于它们的版本:

Versions with intrinsics have already been suggested in the comments, so here's a version that does not rely on them:

uint32_t highestPowerOfTwoIn(uint32_t x)
{
  x |= x >> 1;
  x |= x >> 2;
  x |= x >> 4;
  x |= x >> 8;
  x |= x >> 16;
  return x ^ (x >> 1);
}

这是通过首先在右侧涂抹"最高的设置位,然后 x ^(x>> 1)来进行的,仅保留与直接位于它们左侧的位不同的位(MSB的左边被认为是0),它只是最高的设置位,因为由于拖尾,数字的形式为0 n 1 m (使用字符串表示法,而不是数字幂).

This works by first "smearing" the highest set bit to the right, and then x ^ (x >> 1) keeps only the bits that differ from the bit directly left of them (the msb is considered to have a 0 to left of it), which is only the highest set bit because thanks to the smearing the number is of the form 0n1m (in string notation, not numerical exponentiation).

由于实际上没有人发布它,因此您可以编写具有内在函数(GCC,Clang)的

Since no one is actually posting it, with intrinsics you could write (GCC, Clang)

uint32_t highestPowerOfTwoIn(uint32_t x)
{
  return 0x80000000 >> __builtin_clz(x);
}

或者(可能是未经测试的MSVC)

Or (MSVC, probably, not tested)

uint32_t highestPowerOfTwoIn(uint32_t x)
{
  unsigned long index;
  // ignoring return value, assume x != 0
  _BitScanReverse(&index, x);
  return 1u << index;
}

在目标硬件直接支持的情况下,应该更好.

Which, when directly supported by the target hardware, should be better.

关于coliru的结果

Results on coliru, and latency results on coliru (compare with the baseline too, which should be roughly indicative of the overhead). In the latency result, the first version of highestPowerOfTwoIn doesn't look so good anymore (still OK, but it is a long chain of dependent instructions so it's not a big surprise that it widens the gap with the intrinsics version). Which one of these is the most relevant comparison depends on your actual usage.

如果您有一些奇特的硬件具有快速的位反转操作(但可能是缓慢的移位或慢的 clz ),我们将其称为 _rbit ,那么您可以做到

If you have some odd hardware with a fast bit-reversal operation (but maybe slow shifts or slow clz), let's call it _rbit, then you can do

uint32_t highestPowerOfTwoIn(uint32_t x)
{
  x = _rbit(x);
  return _rbit(x & -x);
}

这当然是基于旧的 x&-x 隔离最低的设置位,周围是位反转,则隔离最高的设置位.

This is of course based on the old x & -x which isolates the lowest set bit, surrounded by bit reversals it's isolating the highest set bit.

这篇关于如何有效地计算小于或等于给定数字的2的最高幂?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆