在 x86-SSE 中将四个压缩单精度浮点转换为无符号双字 [英] convertion of four packed single precision floating point to unsigned double words in x86-SSE

查看:35
本文介绍了在 x86-SSE 中将四个压缩单精度浮点转换为无符号双字的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有没有办法在带有 SSE 扩展的 x86 中将四个打包的单精度浮点值转换为四个双字?最接近的指令是CVTPS2PI,但它不能在两个xmm 寄存器上执行,而应以CVTPS2PI MM, XMM/M64 的形式给出.如果我想要类似 <conversion_mnemonic> 的东西怎么办?XMM, XMM/M128?

Is there a way to convert four packed single precision floating point values to four double words in x86 with SSE extension? The closest instruction would be CVTPS2PI, but it cannot be executed on two xmm registers, instead should be given as CVTPS2PI MM, XMM/M64. What if I want something like <conversion_mnemonic> XMM, XMM/M128?

谢谢.伊曼.

推荐答案

x86 没有对 FP<->unsigned 的原生支持,直到 AVX512,带有 vcvtps2udq (https://www.felixcloutier.com/x86/vcvtps2udq).对于标量,您通常只需转换为 64 位有符号 (cvtss2si rax, xmm0) 并取其低 32 位(在 EAX 中),但这不是 SIMD 的选项.

x86 doesn't have native support for FP<->unsigned until AVX512, with vcvtps2udq (https://www.felixcloutier.com/x86/vcvtps2udq). For scalar you normally just convert to 64-bit signed (cvtss2si rax, xmm0) and take the low 32 bits of that (in EAX), but that's not an option with SIMD.

如果没有 AVX-512,理想情况下您可以使用签名转换 (cvtps2dq) 并获得相同的结果.即如果您的浮点数为非负且 INT_MAX (2147483647.0).

Without AVX-512, ideally you can use a signed conversion (cvtps2dq) and get the same result. i.e. if your floats are non-negative and <= INT_MAX (2147483647.0).

请参阅如何有效地执行 double/int64 转换使用 SSE/AVX? 进行相关的 double->uint64_t 转换.如果需要,全范围的应该可以从 double->uint64_t 适应到 float->uint32_t.

See How to efficiently perform double/int64 conversions with SSE/AVX? for a related double->uint64_t conversion. The full-range one should be adaptable from double->uint64_t to float->uint32_t if you need it.

另一种可能性(对于 32 位浮点数-> uint32_t)只是范围转换为带符号的 FP,然后翻转回整数.INT32_MIN ^ 转换(x + INT32_MIN).但这引入了小整数的 FP 舍入,因为 INT32_MIN 超出 -224 .. 224 范围,其中 float 可以表示每个整数.例如5 将在转换过程中四舍五入为最接近的 28 倍数.所以这是不可用的;您需要尝试直接转换和范围移动转换,并且只有在直接转换给您 0x80000000 时才使用范围移动转换.(也许使用直接转换结果作为 SSE4 blendvps 的混合控件?)

Another possibility (for 32-bit float->uint32_t) is just range-shifting to signed in FP, then flipping back in integer. INT32_MIN ^ convert(x + INT32_MIN). But that introduces FP rounding for small integers because INT32_MIN is outside the -224 .. 224 range where a float can represent every integer. e.g. 5 would be rounded to the nearest multiple of 28 during conversion. So that's not usable; you'd need to try straight conversion and range-shifted conversion, and only use the range-shifted conversion if straight conversion gave you 0x80000000. (Perhaps using the straight conversion result as a blend control for SSE4 blendvps?)

对于float->int32_t的压缩转换,有SSE2 cvtps2dq xmm, xmm/m128 文档.(cvttps2dq 将截断转换为 0,而不是当前的默认舍入模式(最接近,如果您没有更改它).

For packed conversion of float->int32_t, there is SSE2 cvtps2dq xmm, xmm/m128 docs. (cvttps2dq converts with truncation toward 0, instead of the current default rounding mode (nearest, if you haven't changed it).)

任何小于 -0.5 的负浮点数将转换为整数 -1 或更低;作为 uint32_t 位模式代表一个巨大的数字.-231..231-1 范围外的浮点数被转换为 0x80000000,英特尔的整数不定"价值.

Any negative float less than -0.5 will convert to integer -1 or lower; as an uint32_t that bit-pattern represents a huge number. Floats outside the -231..231-1 range get converted to 0x80000000, Intel's "integer indefinite" value.

如果你没有发现,只有cvtps2pi签名转换成MMX寄存器,你需要更好的地方去搜索:

If you didn't find that, only cvtps2pi signed conversion into an MMX register, you need better places to search:

  • https://stackoverflow.com/tags/sse/info - links
  • https://www.felixcloutier.com/x86/ x86 instruction-set list.
  • https://www.officedaytime.com/simd512e/simd.html - lists of instructions by category / function
  • https://software.intel.com/sites/landingpage/IntrinsicsGuide/ - asm instruction mnemonics are listed for intrinsics that only expose the functionality of a single instruction. And normally you're better off writing C with intrinsics than asm by hand, especially if you don't already know about relatively common / simple instructions like cvtps2dq and cvttps2dq.
  • https://agner.org/optimize/ - his asm optimization guide has a chapter on SIMD with a handy table of different kinds of data-movement instructions.
  • How can I convert an XMM register of single-precision floats to integers? - a pointer in the right direction, but covering only signed conversion. I didn't find an exact duplicate.

这篇关于在 x86-SSE 中将四个压缩单精度浮点转换为无符号双字的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆