移位__m128i的n位 [英] Shift a __m128i of n bits

查看:105
本文介绍了移位__m128i的n位的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个__m128i变量,我需要将其128位值转换为n位,即像_mm_srli_si128_mm_slli_si128一样工作,但是按位而不是字节.最有效的方法是什么?

I have a __m128i variable and I need to shift its 128 bit value of n bits, i.e. like _mm_srli_si128 and _mm_slli_si128 work, but on bits instead of bytes. What is the most efficient way of doing this?

推荐答案

对于使用SSE2进行左/右立即移位,这是我能想到的最好的方法:

This is the best that I could come up with for left/right immediate shifts with SSE2:

#include <stdio.h>
#include <emmintrin.h>

#define SHL128(v, n) \
({ \
    __m128i v1, v2; \
 \
    if ((n) >= 64) \
    { \
        v1 = _mm_slli_si128(v, 8); \
        v1 = _mm_slli_epi64(v1, (n) - 64); \
    } \
    else \
    { \
        v1 = _mm_slli_epi64(v, n); \
        v2 = _mm_slli_si128(v, 8); \
        v2 = _mm_srli_epi64(v2, 64 - (n)); \
        v1 = _mm_or_si128(v1, v2); \
    } \
    v1; \
})

#define SHR128(v, n) \
({ \
    __m128i v1, v2; \
 \
    if ((n) >= 64) \
    { \
        v1 = _mm_srli_si128(v, 8); \
        v1 = _mm_srli_epi64(v1, (n) - 64); \
    } \
    else \
    { \
        v1 = _mm_srli_epi64(v, n); \
        v2 = _mm_srli_si128(v, 8); \
        v2 = _mm_slli_epi64(v2, 64 - (n)); \
        v1 = _mm_or_si128(v1, v2); \
    } \
    v1; \
})

int main(void)
{
    __m128i va = _mm_setr_epi8(0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f);
    __m128i vb, vc;

    vb = SHL128(va, 4);
    vc = SHR128(va, 4);

    printf("va = %02vx\n", va);
    printf("vb = %02vx\n", vb);
    printf("vc = %02vx\n", vc);
    printf("\n");

    vb = SHL128(va, 68);
    vc = SHR128(va, 68);

    printf("va = %02vx\n", va);
    printf("vb = %02vx\n", vb);
    printf("vc = %02vx\n", vc);

    return 0;
}

测试:

$ gcc -Wall -msse2 shift128.c && ./a.out
va = 00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f
vb = 00 10 20 30 40 50 60 70 80 90 a0 b0 c0 d0 e0 f0
vc = 10 20 30 40 50 60 70 80 90 a0 b0 c0 d0 e0 f0 00

va = 00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f
vb = 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70
vc = 90 a0 b0 c0 d0 e0 f0 00 00 00 00 00 00 00 00 00
$ 

请注意,SHL128/SHR128宏是使用gcc,clang和某些其他编译器支持的gcc扩展实现的,但是如果您的编译器不支持此扩展,则需要对其进行修改.

Note that the SHL128/SHR128 macros are implemented using a gcc extension supported by gcc, clang and some other compilers, but these will need to be adapted if your compiler does not support this extension.

还请注意,测试工具中使用的SIMD类型的printf扩展名可与Apple gcc,clang和一起使用,但是如果您的编译器不支持此扩展,并且您想测试代码您将需要实现自己的SIMD打印例程.

Note also that the printf extension for SIMD types used in the test harness works with Apple gcc, clang, et al, but again if your compiler does not support this and you want to test the code you'll need to implement your own SIMD print routines.

关于性能的注意事项-只要n是编译时常数(对于移位内在函数而言,无论如何都必须是它),if/else分支将得到优化,因此对于n> =,您有2条指令n <64的情况和4的说明. 64例.

Note on performance - the if/else branch will get optimised out so long as n is a compile-time constant (which it needs to be anyway for the shift intrinsics) so you have 2 instructions for the n >= 64 case and 4 instructions for the n < 64 case.

这篇关于移位__m128i的n位的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆