我在理解AVX随机播放内建函数如何在8位上工作时遇到了一些问题 [英] I've some problems understanding how AVX shuffle intrinsics are working for 8 bits

查看:87
本文介绍了我在理解AVX随机播放内建函数如何在8位上工作时遇到了一些问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试通过使用_mm256_shuffle_epi8将16位数据打包为8位,但是我得到的结果不是我期望的.

I'm trying to pack 16 bits data to 8 bits by using _mm256_shuffle_epi8 but the result i have is not what i'm expecting.


auto srcData = _mm256_setr_epi8(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 
                               17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32);

__m256i vperm = _mm256_setr_epi8( 0,  2,  4,  6,  8, 10, 12, 14,
                                 16, 18, 20, 22, 24, 26, 28, 30,
                                 -1, -1, -1, -1, -1, -1, -1, -1,
                                 -1, -1, -1, -1, -1, -1, -1, -1);

auto result = _mm256_shuffle_epi8(srcData, vperm);

我希望结果包含:

1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31,
0, 0, 0, 0, 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0

但是我有:

1, 3, 5, 7, 9, 11, 13, 15,  1,  3,  5,  7,  9, 11, 13, 15,
0, 0, 0, 0, 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0

我肯定误解了Shuffle的工作原理.如果有人能启发我,将不胜感激:)

I surely misunderstood how Shuffle works. If anyone can enlighten me, it will be very appreciated :)

推荐答案

是的.查看_mm_shuffle_epi8的文档.256位avx版本仅对YMM寄存器中的两个16字节值复制了该128位指令的行为.

Yeah, to be expected. Look at the docs for _mm_shuffle_epi8. The 256bit avx version simply duplicates the behaviour of that 128bit instruction for the two 16byte values in the YMM register.

因此,您可以随机播放前16个值或后16个值;但是,您不能在16字节边界上混洗值.(您会注意到,所有超过16的数字都是相同的数字减去16.例如19-> 3、31-> 15等).

So you can shuffle the first 16 values, or the last 16 values; however you cannot shuffle values across the 16byte boundary. (You'll notice that all numbers over 16, are the same numbers minus 16. e.g. 19->3, 31->15, etc).

您将需要执行额外的步骤.

you'll need to do this with an additional step.

__m256i vperm = _mm256_setr_epi8( 0,  2,  4,  6,  8, 10, 12, 14,
                                 -1, -1, -1, -1, -1, -1, -1, -1,
                                  0,  2,  4,  6,  8, 10, 12, 14,
                                 -1, -1, -1, -1, -1, -1, -1, -1);

,然后使用_mm256_permute2f128_si256将第0和第2个字节拉入前128位.

and then use _mm256_permute2f128_si256 to pull the 0th and 2nd byte into the first 128bits.

这篇关于我在理解AVX随机播放内建函数如何在8位上工作时遇到了一些问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆