SSE __m128i寄存器中的置换字节 [英] Permuting bytes inside SSE __m128i register
问题描述
我有以下问题:
在__m128i
寄存器中,按以下顺序有16个8位值:
In __m128i
register there are 16 8bit values in following ordering:
[ 1, 5, 9, 13 ] [ 2, 6, 10, 14] [3, 7, 11, 15] [4, 8, 12, 16]
我想要实现的是有效地对字节进行混洗以获得这种顺序:
What I would like to achieve is efficiently shuffle bytes to get this ordering:
[ 1, 2, 3, 4 ] [ 5, 6, 7, 8] [9, 10, 11, 12] [13, 14, 15, 16]
它实际上类似于4x4矩阵转置,但在8位元素上运行 在一个寄存器中.
It is actually analog to 4x4 matrix transposition, but operating on 8-bits element inside one register.
您能给我指出什么样的SSE(最好是< = SSE2)说明 适合实现这一目标吗?
Do you please can point me to what kind of SSE (preferabbly <= SSE2) instructions are suitable for realizing this ?
推荐答案
您确实要为此使用SSSE3,它比尝试< = SSE2
You really will want to go SSSE3 for this, it's much more clean than trying to go <= SSE2
您的代码将如下所示:
#include <tmmintrin.h> // _mm_shuffle_epi8
#include <tmmintrin.h> // _mm_set_epi8
...
// check if your hardware supports SSSE3
...
__m128i mask = _mm_set_epi8(15, 11, 7, 3,
14, 10, 6, 2,
13, 9, 5, 1,
12, 8, 4, 0);
__m128i mtrx = _mm_set_epi8(16, 12, 8, 4,
15, 11, 7, 3,
14, 10, 6, 2,
13, 9, 5, 1);
mtrx = _mm_shuffle_epi8(mtrx, mask);
如果您真的想要SSE2,就足够了:
(假设我正确解释了您的初始订购)
If you really want SSE2 this will suffice:
(assuming I'm interpreting your initial ordering correctly)
__m128i mask = _mm_set_epi8(0x00, 0xFF, 0x00, 0xFF,
0x00, 0xFF, 0x00, 0xFF,
0x00, 0xFF, 0x00, 0xFF,
0x00, 0xFF, 0x00, 0xFF);
__m128i mtrx = _mm_set_epi8(16, 12, 8, 4,
15, 11, 7, 3,
14, 10, 6, 2,
13, 9, 5, 1); // [1, 5, 9, 13] [2, 6, 10, 14] [3, 7, 11, 15] [ 4, 8, 12, 16]
mtrx = _mm_packus_epi16(_mm_and_si128(mtrx, mask), _mm_srli_epi16(mtrx, 8)); // [1, 9, 2, 10] [3, 11, 4, 12] [5, 13, 6, 14] [ 7, 15, 8, 16]
mtrx = _mm_packus_epi16(_mm_and_si128(mtrx, mask), _mm_srli_epi16(mtrx, 8)); // [1, 2, 3, 4] [5, 6, 7, 8] [9, 10, 11, 12] [13, 14, 15, 16]
或更容易调试:
__m128i mtrx = _mm_set_epi8(16, 12, 8, 4,
15, 11, 7, 3,
14, 10, 6, 2,
13, 9, 5, 1); // [1, 5, 9, 13] [ 2, 6, 10, 14] [ 3, 7, 11, 15] [ 4, 8, 12, 16]
__m128i mask = _mm_set_epi8(0x00, 0xFF, 0x00, 0xFF,
0x00, 0xFF, 0x00, 0xFF,
0x00, 0xFF, 0x00, 0xFF,
0x00, 0xFF, 0x00, 0xFF);
__m128i temp = _mm_srli_epi16(mtrx, 8); // [5, 0, 13, 0] [ 6, 0, 14, 0] [ 7, 0, 15, 0] [ 8, 0, 16, 0]
mtrx = _mm_and_si128(mtrx, mask); // [1, 0, 9, 0] [ 2, 0, 10, 0] [ 3, 0, 11, 0] [ 4, 0, 12, 0]
mtrx = _mm_packus_epi16(mtrx, temp); // [1, 9, 2, 10] [ 3, 11, 4, 12] [ 5, 13, 6, 14] [ 7, 15, 8, 16]
temp = _mm_srli_epi16(mtrx, 8); // [9, 0, 10, 0] [11, 0, 12, 0] [13, 0, 14, 0] [15, 0, 16, 0]
mtrx = _mm_and_si128(mtrx, mask); // [1, 0, 2, 0] [ 3, 0, 4, 0] [ 5, 0, 6, 0] [ 7, 0, 8, 0]
mtrx = _mm_packus_epi16(mtrx, temp); // [1, 2, 3, 4] [ 5, 6, 7, 8] [ 9, 10, 11, 12] [13, 14, 15, 16]
这篇关于SSE __m128i寄存器中的置换字节的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!