如何执行逐元素与__m128i左移? [英] How to perform element-wise left shift with __m128i?

查看:242
本文介绍了如何执行逐元素与__m128i左移?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经找到了SSE移位指令可以相同数额上的所有元素只能转移:


  • _mm_sll_epi32()

  • _mm_slli_epi32()

<一个href=\"http://software.intel.com/sites/products/documentation/studio/composer/en-us/2011/compiler_c/intref_cls/common/intref_sse2_int_shift.htm\"相对=nofollow>这些转移的所有元素,但同样的偏移量。

有不同的变化应用到不同元素的方法吗?事情是这样的:

  __ m128i一个,__m128i B:R0:= A0&LT;&LT; B0;
R1:= A1中;&下; B1;
R2:= A2&LT;&LT; B2;
R3:= A3&LT;&LT; B3;


解决方案

有存在 _mm_shl_epi32()内在的正是这么做的。

<一个href=\"http://msdn.microsoft.com/en-us/library/gg445138.aspx\">http://msdn.microsoft.com/en-us/library/gg445138.aspx

然而,它需要 XOP指令集 。只有AMD推土机处理器特拉格斯或更高版本有此指令。它不提供任何英特尔处理器。

如果你想这样做没有XOP指令,你需要做的是硬的方式:将它们拉出来,做逐一

如果没有XOP指令,可以使用以下内在与SSE4.1做到这一点:


  • _mm_insert_epi32()

  • _mm_extract_epi32()

<一个href=\"http://software.intel.com/sites/products/documentation/studio/composer/en-us/2011/compiler_c/intref_cls/common/intref_sse41_reg_ins_ext.htm\">http://software.intel.com/sites/products/documentation/studio/composer/en-us/2011/compiler_c/intref_cls/common/intref_sse41_reg_ins_ext.htm

这些都会让你提取128位寄存器的部分进入正规寄存器做转变,把它们放回去。

如果您使用后一种方法去,这将是惊人,效率低下。这就是为什么 _mm_shl_epi32()在首位存在。

The SSE shift instructions I have found can only shift by the same amount on all the elements:

  • _mm_sll_epi32()
  • _mm_slli_epi32()

These shift all elements, but by the same shift amount.

Is there a way to apply different shifts to the different elements? Something like this:

__m128i a,  __m128i b;  

r0:=    a0  <<  b0;
r1:=    a1  <<  b1;
r2:=    a2  <<  b2;
r3:=    a3  <<  b3;

解决方案

There exists the _mm_shl_epi32() intrinsic that does exactly that.

http://msdn.microsoft.com/en-us/library/gg445138.aspx

However, it requires the XOP instruction set. Only AMD Bulldozer and Interlagos processors or later have this instruction. It is not available on any Intel processor.

If you want to do it without XOP instructions, you will need to do it the hard way: Pull them out and do them one by one.

Without XOP instructions, you can do this with SSE4.1 using the following intrinsics:

  • _mm_insert_epi32()
  • _mm_extract_epi32()

http://software.intel.com/sites/products/documentation/studio/composer/en-us/2011/compiler_c/intref_cls/common/intref_sse41_reg_ins_ext.htm

Those will let you extract parts of a 128-bit register into regular registers to do the shift and put them back.

If you go with the latter method, it'll be horrifically inefficient. That's why _mm_shl_epi32() exists in the first place.

这篇关于如何执行逐元素与__m128i左移?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆