将屏蔽的位移到lsb [英] Shift masked bits to the lsb

查看:131
本文介绍了将屏蔽的位移到lsb的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当您带有掩码的某些数据时,您将得到与数据/掩码大小相同的结果。
我想做的是将结果中的被掩码位(掩码中有1)并向右移动,使它们彼此相邻,然后执行CTZ(计数尾随)

When you and some data with a mask you get some result which is of the same size as the data/mask. What I want to do, is to take the masked bits in the result (where there was 1 in the mask) and shift them to the right so they are next to each other and I can perform a CTZ (Count Trailing Zeroes) on them.

我不知道如何命名这样的程序,所以Google让我失败了。操作最好不是循环解决方案,这必须是尽可能快的操作。

I didn't know how to name such a procedure so Google has failed me. The operation should preferably not be a loop solution, this has to be as fast operation as possible.

这是一张令人难以置信的图像MS Paint。

And here is an incredible image made in MS Paint.

推荐答案

此操作称为正确压缩。它是 BMI2 的一部分,作为 PEXT 指令,在Haswell及以后的Intel处理器中使用。

This operation is known as compress right. It is implemented as part of BMI2 as the PEXT instruction, in Intel processors as of Haswell.

不幸的是,如果没有硬件支持,这将是一个非常烦人的操作。当然,有一个明显的解决方案,就是将一个位循环移动,这是Hackers Delight给出的:

Unfortunately, without hardware support is it a quite annoying operation. Of course there is an obvious solution, just moving the bits one by one in a loop, here is the one given by Hackers Delight:

unsigned compress(unsigned x, unsigned m) {
   unsigned r, s, b;    // Result, shift, mask bit. 

   r = 0; 
   s = 0; 
   do {
      b = m & 1; 
      r = r | ((x & b) << s); 
      s = s + b; 
      x = x >> 1; 
      m = m >> 1; 
   } while (m != 0); 
   return r; 
} 

但是Hackers Delight也提供了另一种方法,它做的更少循环(迭代数以位数为对数),但每次迭代更多:

But there is an other way, also given by Hackers Delight, which does less looping (number of iteration logarithmic in the number of bits) but more per iteration:

unsigned compress(unsigned x, unsigned m) {
   unsigned mk, mp, mv, t; 
   int i; 

   x = x & m;           // Clear irrelevant bits. 
   mk = ~m << 1;        // We will count 0's to right. 

   for (i = 0; i < 5; i++) {
      mp = mk ^ (mk << 1);             // Parallel prefix. 
      mp = mp ^ (mp << 2); 
      mp = mp ^ (mp << 4); 
      mp = mp ^ (mp << 8); 
      mp = mp ^ (mp << 16); 
      mv = mp & m;                     // Bits to move. 
      m = m ^ mv | (mv >> (1 << i));   // Compress m. 
      t = x & mv; 
      x = x ^ t | (t >> (1 << i));     // Compress x. 
      mk = mk & ~mp; 
   } 
   return x; 
}

请注意,其中很多值仅取决于 m 。由于您只有512个不同的掩码,因此您可以对其进行预先计算并将其简化为类似这样的代码(未经测试)

Notice that a lot of the values there depend only on m. Since you only have 512 different masks, you could precompute those and simplify the code to something like this (not tested)

unsigned compress(unsigned x, int maskindex) {
   unsigned t; 
   int i; 

   x = x & masks[maskindex][0];

   for (i = 0; i < 5; i++) {
      t = x & masks[maskindex][i + 1]; 
      x = x ^ t | (t >> (1 << i));
   } 
   return x; 
}

当然,所有这些都可以通过展开而变成非循环 ,第二和第三种方法可能更适合于此。但是,这有点作弊。

Of course all of these can be turned into "not a loop" by unrolling, the second and third ways are probably more suitable for that. That's a bit of cheat however.

这篇关于将屏蔽的位移到lsb的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆