如何使用avx指令将float向量转换为short int? [英] How can I convert a vector of float to short int using avx instructions?
问题描述
基本上,我该如何使用AVX2内部函数编写与此等效的内容?在此我们假定result_in_float
的类型为__m256
,而result
的类型为short int*
或short int[8]
.
Basically how can I write the equivalent of this with AVX2 intrinsics? We assume here that result_in_float
is of type __m256
, while result
is of type short int*
or short int[8]
.
for(i = 0; i < 8; i++)
result[i] = (short int)result_in_float[i];
我知道可以使用__m256i _mm256_cvtps_epi32(__m256 m1)
内在函数将浮点数转换为32位整数,但是不知道如何将这些32位整数进一步转换为16位整数.而且我不仅希望这样做,而且还要将这些值(以16位整数的形式)存储到内存中,而我想全部使用矢量指令来做到这一点.
I know that floats can be converted to 32 bit integers using the __m256i _mm256_cvtps_epi32(__m256 m1)
intrinsic, but have no idea how to convert these 32 bit integers further to 16 bit integers. And I don't want just that but also to store those values (in the form of 16 bit integers) to the memory, and I want to do that all using vector instructions.
在互联网上搜索时,我发现了一个名称为_mm256_mask_storeu_epi16
的内在函数,但是我不确定这是否能解决问题,因为我找不到使用它的例子.
Searching around the internet, I found an intrinsic by the name of_mm256_mask_storeu_epi16
, but I'm not really sure if that would do the trick, as I couldn't find an example of its usage.
推荐答案
_mm256_cvtps_epi32
是很好的第一步,转换为打包的短裤矢量有点烦人,需要交叉切片混洗(所以很好它不在此处的依赖关系链中.)
_mm256_cvtps_epi32
is a good first step, the conversion to a packed vector of shorts is a bit annoying, requiring a cross-slice shuffle (so it's good that it's not in a dependency chain here).
由于可以假定值在正确的范围内(根据注释),我们可以使用_mm256_packs_epi32
而不是_mm256_shuffle_epi8
进行转换,无论哪种方式,它都是端口5上的1周期指令,但使用_mm256_packs_epi32
可以避免从某处获取随机播放蒙版.
Since the values can be assumed to be in the right range (as per the comment), we can use _mm256_packs_epi32
instead of _mm256_shuffle_epi8
to do the conversion, either way it's a 1-cycle instruction on port 5 but using _mm256_packs_epi32
avoids having to get a shuffle mask from somewhere.
因此,将它们放在一起(未经测试)
So to put it together (not tested)
__m256i tmp = _mm256_cvtps_epi32(result_in_float);
tmp = _mm256_packs_epi32(tmp, _mm256_setzero_si256());
tmp = _mm256_permute4x64_epi64(tmp, 0xD8);
__m128i res = _mm256_castsi256_si128(tmp);
// _mm_store_si128 that
最后一步(广播)是免费的,只需更改类型即可.
The last step (cast) is free, it just changes the type.
如果您有两个要转换的浮点向量,则可以重新使用大多数指令,例如:(也未经测试)
If you had two vectors of floats to convert, you could re-use most of the instructions, eg: (not tested either)
__m256i tmp1 = _mm256_cvtps_epi32(result_in_float1);
__m256i tmp2 = _mm256_cvtps_epi32(result_in_float2);
tmp1 = _mm256_packs_epi32(tmp1, tmp2);
tmp1 = _mm256_permute4x64_epi64(tmp1, 0xD8);
// _mm256_store_si256 this
这篇关于如何使用avx指令将float向量转换为short int?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!