SSE 内在函数:将 32 位浮点数转换为 UNSIGNED 8 位整数 [英] SSE intrinsics: Convert 32-bit floats to UNSIGNED 8-bit integers

查看:58
本文介绍了SSE 内在函数:将 32 位浮点数转换为 UNSIGNED 8 位整数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用 SSE 内在函数,我得到了一个包含四个 32 位浮点数的向量,该向量被限制在 0-255 的范围内并四舍五入到最接近的整数.我现在想把这四个写成字节.

Using SSE intrinsics, I've gotten a vector of four 32-bit floats clamped to the range 0-255 and rounded to nearest integer. I'd now like to write those four out as bytes.

有一个内在的 _mm_cvtps_pi8 可以将 32 位转换为 8 位 signed int,但问题是任何超过 127 的值都会被限制为 127.我找不到任何可以限制为无符号 8 位值的指令.

There is an intrinsic _mm_cvtps_pi8 that will convert 32-bit to 8-bit signed int, but the problem there is that any value over 127 gets clamped to 127. I can't find any instructions that will clamp to unsigned 8-bit values.

我有一种直觉,我可能想要做的是 _mm_cvtps_pi16_mm_shuffle_pi8 的某种组合,然后是移动指令,将我关心的四个字节放入内存中.这是最好的方法吗?我要去看看我是否能弄清楚如何编码shuffle control mask.

I have an intuition that what I may want to do is some combination of _mm_cvtps_pi16 and _mm_shuffle_pi8 followed by move instruction to get the four bytes I care about into memory. Is that the best way to do it? I'm going to see if I can figure out how to encode the shuffle control mask.

更新:以下内容似乎完全符合我的要求.有没有更好的办法?

UPDATE: The following appears to do exactly what I want. Is there a better way?

#include <tmmintrin.h>
#include <stdio.h>

unsigned char out[8];
unsigned char shuf[8] = { 0, 2, 4, 6, 128, 128, 128, 128 };
float ins[4] = {500, 0, 120, 240};

int main()
{
    __m128 x = _mm_load_ps(ins);    // Load the floats
    __m64 y = _mm_cvtps_pi16(x);    // Convert them to 16-bit ints
    __m64 sh = *(__m64*)shuf;       // Get the shuffle mask into a register
    y = _mm_shuffle_pi8(y, sh);     // Shuffle the lower byte of each into the first four bytes
    *(int*)out = _mm_cvtsi64_si32(y); // Store the lower 32 bits

    printf("%d\n", out[0]);
    printf("%d\n", out[1]);
    printf("%d\n", out[2]);
    printf("%d\n", out[3]);
    return 0;
}

UPDATE2:这是基于 Harold 回答的更好的解决方案:

UPDATE2: Here's an even better solution based on Harold's answer:

#include <smmintrin.h>
#include <stdio.h>

unsigned char out[8];
float ins[4] = {10.4, 10.6, 120, 100000};

int main()
{   
    __m128 x = _mm_load_ps(ins);       // Load the floats
    __m128i y = _mm_cvtps_epi32(x);    // Convert them to 32-bit ints
    y = _mm_packus_epi32(y, y);        // Pack down to 16 bits
    y = _mm_packus_epi16(y, y);        // Pack down to 8 bits
    *(int*)out = _mm_cvtsi128_si32(y); // Store the lower 32 bits

    printf("%d\n", out[0]);
    printf("%d\n", out[1]);
    printf("%d\n", out[2]);
    printf("%d\n", out[3]);
    return 0;
}

推荐答案

没有从浮点数到字节的直接转换,_mm_cvtps_pi8 是一个组合._mm_cvtps_pi16 也是一个复合体,在这种情况下,它只是做一些你用 shuffle 撤消的无意义的东西.它们还会返回烦人的 __m64.

There is no direct conversion from float to byte, _mm_cvtps_pi8 is a composite. _mm_cvtps_pi16 is also a composite, and in this case it's just doing some pointless stuff that you undo with the shuffle. They also return annoying __m64's.

无论如何,我们可以转换为 dwords(有符号,但无所谓),然后打包(无符号)或 shuffle 成字节._mm_shuffle_(e)pi8 生成一个 pshufb,Core2 45nm 和 AMD 处理器不太喜欢它,你必须从某个地方得到一个掩码.

Anyway, we can convert to dwords (signed, but that doesn't matter), and then pack (unsigned) or shuffle them into bytes. _mm_shuffle_(e)pi8 generates a pshufb, Core2 45nm and AMD processors aren't too fond of it and you have to get a mask from somewhere.

无论哪种方式,您都不必先四舍五入到最接近的整数,转换程序会这样做.至少,如果你没有弄乱舍入模式.

Either way you don't have to round to the nearest integer first, the convert will do that. At least, if you haven't messed with the rounding mode.

使用包 1:(未测试)——可能没有用,packusdw 已经输出了无符号词,但是 packuswb 再次想要有符号词.保留下来,因为它在别处被引用.

Using packs 1: (not tested) -- probably not useful, packusdw already outputs unsigned words but then packuswb wants signed words again. Kept around because it is referred to elsewhere.

cvtps2dq xmm0, xmm0  
packusdw xmm0, xmm0     ; unsafe: saturates to a different range than packuswb accepts
packuswb xmm0, xmm0
movd somewhere, xmm0

使用不同的随机播放:

cvtps2dq xmm0, xmm0  
packssdw xmm0, xmm0     ; correct: signed saturation on first step to feed packuswb
packuswb xmm0, xmm0
movd somewhere, xmm0

使用 shuffle:(未测试)

Using shuffle: (not tested)

cvtps2dq xmm0, xmm0
pshufb xmm0, [shufmask]
movd somewhere, xmm0

shufmask: db 0, 4, 8, 12, 80h, 80h, 80h, 80h, 80h, 80h, 80h, 80h, 80h, 80h, 80h, 80h

这篇关于SSE 内在函数:将 32 位浮点数转换为 UNSIGNED 8 位整数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆