寻找非直接移值SSE 128移位操作 [英] Looking for sse 128 bit shift operation for non-immediate shift value

查看:942
本文介绍了寻找非直接移值SSE 128移位操作的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

本征 _mm_slli_si128 会做留下了128位寄存器的逻辑移位,而是由字节限制为直接移位值,变化不是位。

The intrinsic _mm_slli_si128 will do a logical shift left of a 128 bit register, but is restricted to immediate shift values, and shifts by bytes not bits.

我可以用一个固有像 _mm_sll_epi64 _mm_sll_epi32 来左移的<$ C中的一组值$ C> __ m128i 注册,但这些不携带溢出位。

I can use an intrinsic like _mm_sll_epi64 or _mm_sll_epi32 to shift left a set of values within the __m128i register, but these don't carry the "overflow" bits.

对于由N位的转变想象我可以做一个这样的:

For a shift by N bits imagine that I could do a something like:


  • _mm_sll_epi64

  • _mm_srr_epi64 (因为我想携带位:它们移到低位)

  • 洗牌SRR结果

  • 或这些结合在一起。

  • _mm_sll_epi64
  • _mm_srr_epi64 (for the bits I want to carry: move them into the low order )
  • shuffle the srr result
  • or these together.

(但可能也有包括N相关检查,以64)。

(but probably also have to include checks of N relative to 64).

有没有更好的办法?

推荐答案

本上的不寻常的çpreprocessor使用
为127个不同的移位偏移,也有SSE2指令为一比特移位四个不同的最优序列。在preprocessor使得它合理的构造相当于一个129路开关语句移位功能。这里原谅raw- code;我不熟悉发帖code直接在这里。
查看博客帖子是怎么回事的解释。

This came up as a side issue in a blog post (of mine) on unusual C preprocessor uses. For the 127 different shift offsets, there are four different optimal sequences of SSE2 instructions for a bit shift. The preprocessor makes it reasonable to construct a shift function that amounts to a 129-way switch statement. Pardon the raw-code here; I'm unfamiliar with posting code directly here. Check the blog post for an explanation of what's going on.

#include <emmintrin.h>

typedef __m128i XMM;
#define xmbshl(x,n)  _mm_slli_si128(x,n) // xm <<= 8*n  -- BYTE shift left
#define xmbshr(x,n)  _mm_srli_si128(x,n) // xm >>= 8*n  -- BYTE shift right
#define xmshl64(x,n) _mm_slli_epi64(x,n) // xm.hi <<= n, xm.lo <<= n
#define xmshr64(x,n) _mm_srli_epi64(x,n) // xm.hi >>= n, xm.lo >>= n
#define xmand(a,b)   _mm_and_si128(a,b)
#define xmor(a,b)    _mm_or_si128(a,b)
#define xmxor(a,b)   _mm_xor_si128(a,b)
#define xmzero       _mm_setzero_si128()

XMM xm_shl(XMM x, unsigned nbits)
{
    // These macros generate (1,2,5,6) SSE2 instructions, respectively:
    #define F1(n) case 8*(n): x = xmbshl(x, n); break;
    #define F2(n) case n: x = xmshl64(xmbshl(x, (n)>>3), (n)&15); break;
    #define F5(n) case n: x = xmor(xmshl64(x, n), xmshr64(xmbshl(x, 8), 64-(n))); break;
    #define F6(n) case n: x = xmor(xmshl64(xmbshl(x, (n)>>3), (n)&15),\
                                  xmshr64(xmbshl(x, 8+((n)>>3)), 64-((n)&155))); break;
    // These macros expand to 7 or 49 cases each:
    #define DO_7(f,x) f((x)+1) f((x)+2) f((x)+3) f((x)+4) f((x)+5) f((x)+6) f((x)+7)
    #define DO_7x7(f,y) DO_7(f,(y)+1*8) DO_7(f,(y)+2*8) DO_7(f,(y)+3*8) DO_7(f,(y)+4*8) \
                                        DO_7(f,(y)+5*8) DO_7(f,(y)+6*8) DO_7(f,(y)+7*8)
    switch (nbits) {
    case 0: break;
    DO_7(F5, 0) // 1..7
    DO_7(F1, 0) // 8,16,..56
    DO_7(F1, 7) // 64,72,..120
    DO_7x7(F6, 0) // 9..15 17..23 ... 57..63 i.e. [9..63]\[16,24,..,56]
    DO_7x7(F2,56) // 65..71 73..79 ... 121..127 i.e. [65..127]\[64,72,..,120]
    default: x = xmzero;
    }
    return x;
}

xm_shr 达以上,但交换双黄连和无处不在的F [1256]宏SHR。 HTH。

xm_shr amounts to the above but swapping "shl" and "shr" everywhere in the F[1256] macros. HTH.

这篇关于寻找非直接移值SSE 128移位操作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆