如何使用avx指令将float向量转换为short int? [英] How can I convert a vector of float to short int using avx instructions?

查看:303
本文介绍了如何使用avx指令将float向量转换为short int?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

基本上,我该如何使用AVX2内部函数编写与此等效的内容?在此我们假定result_in_float的类型为__m256,而result的类型为short int*short int[8].

Basically how can I write the equivalent of this with AVX2 intrinsics? We assume here that result_in_float is of type __m256, while result is of type short int* or short int[8].

for(i = 0; i < 8; i++)
    result[i] = (short int)result_in_float[i];

我知道可以使用__m256i _mm256_cvtps_epi32(__m256 m1)内在函数将浮点数转换为32位整数,但是不知道如何将这些32位整数进一步转换为16位整数.而且我不仅希望这样做,而且还要将这些值(以16位整数的形式)存储到内存中,而我想全部使用矢量指令来做到这一点.

I know that floats can be converted to 32 bit integers using the __m256i _mm256_cvtps_epi32(__m256 m1) intrinsic, but have no idea how to convert these 32 bit integers further to 16 bit integers. And I don't want just that but also to store those values (in the form of 16 bit integers) to the memory, and I want to do that all using vector instructions.

在互联网上搜索时,我发现了一个名称为_mm256_mask_storeu_epi16的内在函数,但是我不确定这是否能解决问题,因为我找不到使用它的例子.

Searching around the internet, I found an intrinsic by the name of_mm256_mask_storeu_epi16, but I'm not really sure if that would do the trick, as I couldn't find an example of its usage.

推荐答案

_mm256_cvtps_epi32是很好的第一步,转换为打包的短裤矢量有点烦人,需要交叉切片混洗(所以很好它不在此处的依赖关系链中.)

_mm256_cvtps_epi32 is a good first step, the conversion to a packed vector of shorts is a bit annoying, requiring a cross-slice shuffle (so it's good that it's not in a dependency chain here).

由于可以假定值在正确的范围内(根据注释),我们可以使用_mm256_packs_epi32而不是_mm256_shuffle_epi8进行转换,无论哪种方式,它都是端口5上的1周期指令,但使用_mm256_packs_epi32可以避免从某处获取随机播放蒙版.

Since the values can be assumed to be in the right range (as per the comment), we can use _mm256_packs_epi32 instead of _mm256_shuffle_epi8 to do the conversion, either way it's a 1-cycle instruction on port 5 but using _mm256_packs_epi32 avoids having to get a shuffle mask from somewhere.

因此,将它们放在一起(未经测试)

So to put it together (not tested)

__m256i tmp = _mm256_cvtps_epi32(result_in_float);
tmp = _mm256_packs_epi32(tmp, _mm256_setzero_si256());
tmp = _mm256_permute4x64_epi64(tmp, 0xD8);
__m128i res = _mm256_castsi256_si128(tmp);
// _mm_store_si128 that

最后一步(广播)是免费的,只需更改类型即可.

The last step (cast) is free, it just changes the type.

如果您有两个要转换的浮点向量,则可以重新使用大多数指令,例如:(也未经测试)

If you had two vectors of floats to convert, you could re-use most of the instructions, eg: (not tested either)

__m256i tmp1 = _mm256_cvtps_epi32(result_in_float1);
__m256i tmp2 = _mm256_cvtps_epi32(result_in_float2);
tmp1 = _mm256_packs_epi32(tmp1, tmp2);
tmp1 = _mm256_permute4x64_epi64(tmp1, 0xD8);
// _mm256_store_si256 this

这篇关于如何使用avx指令将float向量转换为short int?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆