SSE __m128i寄存器中的置换字节 [英] Permuting bytes inside SSE __m128i register

查看:252
本文介绍了SSE __m128i寄存器中的置换字节的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下问题:

__m128i寄存器中,按以下顺序有16个8位值:

In __m128i register there are 16 8bit values in following ordering:

[ 1, 5, 9, 13 ] [ 2, 6, 10, 14] [3, 7, 11, 15]  [4, 8, 12, 16]

我想要实现的是有效地对字节进行混洗以获得这种顺序:

What I would like to achieve is efficiently shuffle bytes to get this ordering:

[ 1, 2, 3, 4 ] [ 5, 6, 7, 8] [9, 10, 11, 12]  [13, 14, 15, 16]

它实际上类似于4x4矩阵转置,但在8位元素上运行 在一个寄存器中.

It is actually analog to 4x4 matrix transposition, but operating on 8-bits element inside one register.

您能给我指出什么样的SSE(最好是< = SSE2)说明 适合实现这一目标吗?

Do you please can point me to what kind of SSE (preferabbly <= SSE2) instructions are suitable for realizing this ?

推荐答案

您确实要为此使用SSSE3,它比尝试< = SSE2

You really will want to go SSSE3 for this, it's much more clean than trying to go <= SSE2

您的代码将如下所示:

   #include <tmmintrin.h> // _mm_shuffle_epi8
   #include <tmmintrin.h> // _mm_set_epi8
   ...
   // check if your hardware supports SSSE3
   ...
   __m128i mask = _mm_set_epi8(15, 11, 7, 3,
                               14, 10, 6, 2,
                               13,  9, 5, 1,
                               12,  8, 4, 0);
   __m128i mtrx = _mm_set_epi8(16, 12, 8, 4,
                               15, 11, 7, 3,
                               14, 10, 6, 2,
                               13,  9, 5, 1);
   mtrx         = _mm_shuffle_epi8(mtrx, mask);

如果您真的想要SSE2,就足够了:
(假设我正确解释了您的初始订购)

If you really want SSE2 this will suffice:
(assuming I'm interpreting your initial ordering correctly)

  __m128i mask = _mm_set_epi8(0x00, 0xFF, 0x00, 0xFF,
                              0x00, 0xFF, 0x00, 0xFF,
                              0x00, 0xFF, 0x00, 0xFF,
                              0x00, 0xFF, 0x00, 0xFF);
  __m128i mtrx = _mm_set_epi8(16, 12, 8, 4,
                              15, 11, 7, 3,
                              14, 10, 6, 2,
                              13,  9, 5, 1);                                   // [1, 5, 9, 13] [2,  6, 10, 14] [3,  7, 11, 15] [ 4,  8, 12, 16]
  mtrx = _mm_packus_epi16(_mm_and_si128(mtrx, mask), _mm_srli_epi16(mtrx, 8)); // [1, 9, 2, 10] [3, 11,  4, 12] [5, 13,  6, 14] [ 7, 15,  8, 16]
  mtrx = _mm_packus_epi16(_mm_and_si128(mtrx, mask), _mm_srli_epi16(mtrx, 8)); // [1, 2, 3,  4] [5,  6,  7,  8] [9, 10, 11, 12] [13, 14, 15, 16]

或更容易调试:

  __m128i mtrx = _mm_set_epi8(16, 12, 8, 4,
                              15, 11, 7, 3,
                              14, 10, 6, 2,
                              13, 9, 5, 1);            // [1, 5,  9, 13] [ 2,  6, 10, 14] [ 3,  7, 11, 15] [ 4,  8, 12, 16]
  __m128i mask = _mm_set_epi8(0x00, 0xFF, 0x00, 0xFF,
                              0x00, 0xFF, 0x00, 0xFF,
                              0x00, 0xFF, 0x00, 0xFF,
                              0x00, 0xFF, 0x00, 0xFF);
  __m128i temp = _mm_srli_epi16(mtrx, 8);              // [5, 0, 13,  0] [ 6,  0, 14,  0] [ 7,  0, 15,  0] [ 8,  0, 16,  0]
  mtrx         = _mm_and_si128(mtrx, mask);            // [1, 0,  9,  0] [ 2,  0, 10,  0] [ 3,  0, 11,  0] [ 4,  0, 12,  0]
  mtrx         = _mm_packus_epi16(mtrx, temp);         // [1, 9,  2, 10] [ 3, 11,  4, 12] [ 5, 13,  6, 14] [ 7, 15,  8, 16]
  temp         = _mm_srli_epi16(mtrx, 8);              // [9, 0, 10,  0] [11,  0, 12,  0] [13,  0, 14,  0] [15,  0, 16,  0]
  mtrx         = _mm_and_si128(mtrx, mask);            // [1, 0,  2,  0] [ 3,  0,  4,  0] [ 5,  0,  6,  0] [ 7,  0,  8,  0] 
  mtrx         = _mm_packus_epi16(mtrx, temp);         // [1, 2,  3,  4] [ 5,  6,  7,  8] [ 9, 10, 11, 12] [13, 14, 15, 16]

这篇关于SSE __m128i寄存器中的置换字节的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆