如何有效地随机播放位? [英] How can I shuffle bits efficiently?

查看:70
本文介绍了如何有效地随机播放位?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要对16位无符号整数进行混洗,以使偶数索引位于低位字节,奇数索引位于高位字节.

I need to shuffle a 16 bit unsigned integer in a way that the even indexes land in the lower byte, and the odd indexes land in the upper byte.

input:
fedcba98 76543210 (contiguously numbered)

output:
fdb97531 eca86420 (even and odd separated)

此刻我的代码如下:

typedef unsigned short u16;

u16 segregate(u16 x)
{
    u16 g = (x & 0x0001);
    u16 h = (x & 0x0004) >> 1;
    u16 i = (x & 0x0010) >> 2;
    u16 j = (x & 0x0040) >> 3;
    u16 k = (x & 0x0100) >> 4;
    u16 l = (x & 0x0400) >> 5;
    u16 m = (x & 0x1000) >> 6;
    u16 n = (x & 0x4000) >> 7;

    u16 o = (x & 0x0002) << 7;
    u16 p = (x & 0x0008) << 6;
    u16 q = (x & 0x0020) << 5;
    u16 r = (x & 0x0080) << 4;
    u16 s = (x & 0x0200) << 3;
    u16 t = (x & 0x0800) << 2;
    u16 u = (x & 0x2000) << 1;
    u16 v = (x & 0x8000);

    return g | h | i | j | k | l | m | n | o | p | q | r | s | t | u | v;
}

我想知道是否有比仅提取和移动每个位更优雅的解决方案?

I wonder if there is a more elegant solution than simply extracting and shifting each individual bit?

推荐答案

有一个非常方便的Web资源,可以帮助解决许多位置换问题:

There is a very convenient web resource that helps solving many bit permutation problems: Code generator for bit permutations. In this particular case feeding "0 2 4 6 8 10 12 14 1 3 5 7 9 11 13 15" to this page produces pretty fast code.

不幸的是,此代码生成器无法生成64位代码(尽管任何人都可以下载源代码并添加此选项).因此,如果我们需要使用64位指令并行执行4个置换,则必须手动将所有涉及的位掩码扩展为64位:

Unfortunately this code generator cannot produce 64-bit code (though anyone could download sources and add this option). So if we need to perform 4 permutations in parallel using 64-bit instructions, we have to extend all involved bitmasks to 64 bits manually:

uint64_t bit_permute_step(uint64_t x, uint64_t m, unsigned shift) {
  uint64_t t;
  t = ((x >> shift) ^ x) & m;
  x = (x ^ t) ^ (t << shift);
  return x;
}

uint64_t segregate4(uint64_t x)
{ // generated by http://programming.sirrida.de/calcperm.php, extended to 64-bit
  x = bit_permute_step(x, 0x2222222222222222ull, 1);
  x = bit_permute_step(x, 0x0c0c0c0c0c0c0c0cull, 2);
  x = bit_permute_step(x, 0x00f000f000f000f0ull, 4);
  return x;
}

使用SSE指令可以进一步提高并行度(一次8或16个排列). (最新版本的gcc可以自动将这些代码向量化.)

Level of parallelism could be increased even more (8 or 16 permutations at once) with SSE instructions. (And recent versions of gcc can vectorize this code automatically).

如果不需要并行性,并且程序的其他部分未广泛使用数据缓存,则更好的选择是使用查找表.其他答案中已经讨论了各种LUT方法,在这里还可以说一些其他方法:

If parallelism is not required and data cache is not extensively used by other parts of the program, better alternative would be to use lookup table. Various LUT approacehes are already discussed in other answers, still some more could be said here:

  1. 16位字的第一位和最后一位永远不会被置换,我们只需要对1..14位进行混洗.因此(如果我们要通过单个LUT访问来执行任务),拥有一个具有16K条目的LUT就足够了,这意味着32K内存.
  2. 我们可以将表查找和计算方法结合起来.在单个256字节表中进行两次查找可能会分别对每个源字节进行混洗.此后,我们只需要交换两个中间的4位半字节.这样可以使查找表保持较小,仅使用2次内存访问,并且不需要太多的计算(即余额计算和内存访问).

这是第二种方法的实现:

Here is implementation of second approach:

#define B10(x)          x+0x00,      x+0x10,      x+0x01,      x+0x11
#define B32(x)      B10(x+0x00), B10(x+0x20), B10(x+0x02), B10(x+0x22)
#define B54(x)      B32(x+0x00), B32(x+0x40), B32(x+0x04), B32(x+0x44)
uint8_t lut[256] = {B54(  0x00), B54(  0x80), B54(  0x08), B54(  0x88)};
#undef B54
#undef B32
#undef B10

uint_fast16_t segregateLUT(uint_fast16_t x)
{
  uint_fast16_t low = lut[x & 0x00ff];
  low |= low << 4;
  uint_fast16_t high = lut[x >> 8] << 4;
  high |= high << 4;
  return (low & 0x0f0f) | (high & 0xf0f0);
}

但是最快的方法(如果不考虑可移植性)是使用BMI2指令集> Bil2指令集中的pext指令,如Nils Pipenbrinck .使用一对64位pext,我们可以并行执行4个16位洗牌.由于pext指令正是针对这种位置换的,所以这种方法很容易胜过其他所有方法.

But fastest approach (if portability is not an issue) is using pext instruction from BMI2 instruction set as noted by Nils Pipenbrinck. With a pair of 64-bit pext we could perform 4 16-bit shuffles in parallel. Since pext instruction is intended exactly for this kind of bit permutations, this approach easily outperforms all others.

这篇关于如何有效地随机播放位?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆