不使用BMI2的PDEP便携式高效替代品？ [英] Portable efficient alternative to PDEP without using BMI2?

查看：136 发布时间：2020/6/3 19:53:20 algorithm assembly x86 bit-manipulation bmi

本文介绍了不使用BMI2的PDEP便携式高效替代品？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

平行存款说明（ PDEP英特尔的位操作指令集2（BMI2）中的）描述了该指令的以下串行实现（类似C的伪代码）：

The documentation for the parallel deposit instruction (PDEP) in Intel's Bit Manipulation Instruction Set 2 (BMI2) describes the following serial implementation for the instruction (C-like pseudocode):

U64 _pdep_u64(U64 val, U64 mask) {
  U64 res = 0;
  for (U64 bb = 1; mask; bb += bb) {
    if (val & bb)
      res |= mask & -mask;
    mask &= mask - 1;
  }
  return res;
}

>另请参阅英特尔的 pdep insn参考手册。

See also Intel's pdep insn ref manual entry.

此算法为O（n），其中n是掩码中设置的位数，这显然是O（k）的最坏情况，其中k是掩码中的总位数。

This algorithm is O(n), where n is the number of set bits in mask, which obviously has a worst case of O(k) where k is the total number of bits in mask.

是否可能有效率更高的最坏情况算法？

Is a more efficient worst case algorithm possible?

是否可以制作一个更快的版本，假设 val 最多设置了一位，即等于0或等于 1<< r 对于 r 从0到63的某个值？

Is it possible to make a faster version that assumes that val has at most one bit set, ie either equals 0 or equals 1<<r for some value of r from 0 to 63?

推荐答案

问题的第二部分，关于1位存款的特殊情况，需要两个步骤。第一步，我们需要确定 val 中单个1位的位索引 r ，其中在 val 为零的情况下的适当响应。这可以通过POSIX函数 ffs 轻松实现，或者通过其他方式已知 r ，例如提问者。在第二步中，我们需要确定<$ c $中第 r 个第1位的位索引 i c> mask （如果存在）。然后，我们可以将 val 的第 r 位存放在 i 。


The second part of the question, about the special case of a 1-bit deposit, requires two steps. In the first step, we need to determine the bit index r of the single 1-bit in val, with a suitable response in case val is zero. This can easily be accomplished via the POSIX function ffs, or if r is known by other means, as alluded to by the asker in comments. In the second step we need to identify bit index i of the r-th 1-bit in mask, if it exists. We can then deposit the r-th bit of val at bit i.

一种找到<$ c $ r中第 r 个1位索引的一种方法$ c> mask 将使用经典的人口计数计算1位数字基于二进制分区的算法，并记录所有中间的逐组位计数。然后，我们对记录的位计数数据执行二进制搜索，以识别所需位的位置。

One way of finding the index of the r-th 1-bit in mask is to tally the 1-bits using a classical population count algorithm based on binary partitioning, and record all of the intermediate group-wise bit counts. We then perform a binary search on the recorded bit-count data to identify the position of the desired bit.

随后的 C -code使用64位数据演示了这一过程。这实际上是否比迭代方法更快，将在很大程度上取决于 mask 和 val 的典型值。

The following C-code demonstrates this using 64-bit data. Whether this is actually faster than the iterative method will very much depend on typical values of mask and val.

#include <stdint.h>

/* Find the index of the n-th 1-bit in mask, n >= 0
   The index of the least significant bit is 0 
   Return -1 if there is no such bit
*/
int find_nth_set_bit (uint64_t mask, int n)
{
    int t, i = n, r = 0;
    const uint64_t m1 = 0x5555555555555555ULL; // even bits
    const uint64_t m2 = 0x3333333333333333ULL; // even 2-bit groups
    const uint64_t m4 = 0x0f0f0f0f0f0f0f0fULL; // even nibbles
    const uint64_t m8 = 0x00ff00ff00ff00ffULL; // even bytes
    uint64_t c1 = mask;
    uint64_t c2 = c1 - ((c1 >> 1) & m1);
    uint64_t c4 = ((c2 >> 2) & m2) + (c2 & m2);
    uint64_t c8 = ((c4 >> 4) + c4) & m4;
    uint64_t c16 = ((c8 >> 8) + c8) & m8;
    uint64_t c32 = (c16 >> 16) + c16;
    int c64 = (int)(((c32 >> 32) + c32) & 0x7f);
    t = (c32    ) & 0x3f; if (i >= t) { r += 32; i -= t; }
    t = (c16>> r) & 0x1f; if (i >= t) { r += 16; i -= t; }
    t = (c8 >> r) & 0x0f; if (i >= t) { r +=  8; i -= t; }
    t = (c4 >> r) & 0x07; if (i >= t) { r +=  4; i -= t; }
    t = (c2 >> r) & 0x03; if (i >= t) { r +=  2; i -= t; }
    t = (c1 >> r) & 0x01; if (i >= t) { r +=  1;         }
    if (n >= c64) r = -1;
    return r; 
}

/* val is either zero or has a single 1-bit.
   Return -1 if val is zero, otherwise the index of the 1-bit
   The index of the least significant bit is 0
*/
int find_bit_index (uint64_t val)
{
    return ffsll (val) - 1;
}

uint64_t deposit_single_bit (uint64_t val, uint64_t mask)
{
    uint64_t res = (uint64_t)0;
    int r = find_bit_index (val);
    if (r >= 0) {
        int i = find_nth_set_bit (mask, r);
        if (i >= 0) res = (uint64_t)1 << i;
    } 
    return res;
}

这篇关于不使用BMI2的PDEP便携式高效替代品？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

不使用BMI2的PDEP便携式高效替代品？ [英] Portable efficient alternative to PDEP without using BMI2?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

不使用BMI2的PDEP便携式高效替代品？ [英] Portable efficient alternative to PDEP without using BMI2?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭