仅使用SSE2提取SSE改组的32位值 [英] Extracting SSE shuffled 32 bit value with only SSE2

查看:122
本文介绍了仅使用SSE2提取SSE改组的32位值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图以有效的方式从128位寄存器中提取4个字节.问题在于每个值都在一个单独的32位{120,0,0,0,55,0,0,0,42,0,0,0,120,0,0,0}中.我想将128位转换为{120,55,42,120}形式的32位.

I am trying to extract 4 bytes out of a 128 bit register in efficient way. The problem is that each value is in a sperate 32bit {120,0,0,0,55,0,0,0,42,0,0,0,120,0,0,0}. I want to transform the 128 bit to 32 bit it the form {120,55,42,120}.

原始"代码如下所示:

__m128i byte_result_vec={120,0,0,0,55,0,0,0,42,0,0,0,120,0,0,0};
unsigned char * byte_result_array=(unsigned char*)&byte_result_vec;
result_array[x]=byte_result_array[0];
result_array[x+1]=byte_result_array[4];
result_array[x+2]=byte_result_array[8];
result_array[x+3]=byte_result_array[12];  

我的SSSE3代码是:

My SSSE3 code is:

unsigned int * byte_result_array=...;
__m128i byte_result_vec={120,0,0,0,55,0,0,0,42,0,0,0,120,0,0,0};
const __m128i eight_bit_shuffle_mask=_mm_set_epi8(1,1,1,1,1,1,1,1,1,1,1,1,0,4,8,12);    
byte_result_vec=_mm_shuffle_epi8(byte_result_vec,eight_bit_shuffle_mask);
unsigned int * byte_result_array=(unsigned int*)&byte_result_vec;
result_array[x]=byte_result_array[0];

如何使用SSE2有效地做到这一点. SSSE3或SSE4是否有更好的版本?

How can I do this efficiently with SSE2. Is there a better version with SSSE3 or SSE4?

推荐答案

您可以查看我的先前答案对此和反向操作的一些解决方案.

You can look at a previous answer of mine for some solutions to this and the reverse operation.

特别是在SSE2中,您可以先将32位整数打包为带符号的16位整数并饱和:

In particular in SSE2 you can do it by first packing the 32-bit integers into signed 16-bit integers and saturating:

byte_result_vec = _mm_packs_epi32(byte_result_vec, byte_result_vec);

然后,我们使用无符号饱和度将那些16位值打包为无符号8位值:

Then we pack those 16-bit values into unsigned 8-bit values using unsigned saturation:

byte_result_vec = _mm_packus_epi16(byte_result_vec, byte_result_vec);

然后,我们最终可以从寄存器的低32位取值:

We can then finally take our values from the lower 32-bits of the register:

int int_result = _mm_cvtsi128_si32(byte_result_vec);
unsigned char* byte_result_array = (unsigned char*)&int_result;
result_array[x]   = byte_result_array[0];
result_array[x+1] = byte_result_array[1];
result_array[x+2] = byte_result_array[2];
result_array[x+3] = byte_result_array[3];

以上假设8位字最初位于其各自32位字的低字节中,其余部分均用0填充,因为否则它们将得到在饱和包装过程中夹紧.因此,操作如下:

The above assumes that the 8-bit words are initially in the low bytes of their respective 32-bit words and the rest is filled with 0s, since otherwise their will get clamped during the saturating packing process. Thus the operations are the following:

             byte   15                               0
                    0 0 0 D  0 0 0 C  0 0 0 B  0 0 0 A

_mm_packs_epi32 ->  0 D 0 C  0 B 0 A  0 D 0 C  0 B 0 A

_mm_packus_epi16 -> D C B A  D C B A  D C B A  D C B A
                                               ^^^^^^^

_mm_cvtsi128_si32 -> int DCBA, laid out in x86 memory as bytes A B C D

-> reinterpreted as unsigned char array { A, B, C, D }

如果最初没有用0填充连续字节,则必须事先屏蔽掉它们:

If the uninterresting bytes are not filled with 0s initially, you have to mask them away beforehand:

byte_result_vec = _mm_and_si128(byte_result_vec, _mm_set1_epi32(0x000000FF));

或者,如果插入的字节最初在高字节中,则必须事先将它们移到低字节中:

Or if the interresting bytes are initially in the high bytes, you have to shift them into the low bytes beforehand:

byte_result_vec = _mm_srli_epi32(byte_result_vec, 24);

或者,如果您实际上想要{ D, C, B, A }(这个问题我不太清楚),那么,这相当于在分配中切换了数组索引(或者交替执行32位随机播放(_mm_shuffle_epi32)在初始SSE寄存器上).

Or, if you actually want { D, C, B, A } (which is not completely clear to me from your question), well, then this amounts to just switching the array index in the assignments (or alternively perfoming a 32-bit shuffle (_mm_shuffle_epi32) on the initial SSE register beforehand).

这篇关于仅使用SSE2提取SSE改组的32位值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆