orpd等SSE2指令的意义何在? [英] What is the point of SSE2 instructions such as orpd?
问题描述
orpd
指令是压缩双精度浮点值的按位逻辑或".这不是与 por
(按位逻辑或")完全相同的事情吗?如果是这样,拥有它有什么意义?
The orpd
instruction is a "bitwise logical OR of packed double precision floating point values". Doesn't this do exactly the same thing as por
("bitwise logical OR")? If so, what's the point of having it?
推荐答案
记住 SSE1 orps
先到先得.(实际上 MMX por mm, mm/mem
甚至更早出现SSE1.)
Remember that SSE1 orps
came first. (Well actually MMX por mm, mm/mem
came even before SSE1.)
具有相同操作码和新前缀的是 SSE2 orpd
指令对硬件解码器逻辑很有意义,我想,就像 movapd
与 movaps
一样.在 ps
和 pd
版本之间,像这样的几个指令是多余的,但有些不是,例如 addps
与 addpd
或 unpcklps
与 unpcklpd
是不同的洗牌.
Having the same opcode with a new prefix be the SSE2 orpd
instruction makes sense for hardware decoder logic, I guess, just like movapd
vs. movaps
. Several instructions like this are redundant between between ps
and pd
versions, but some aren't, like addps
vs. addpd
or unpcklps
vs. unpcklpd
being different shuffles.
SSE2 的原因还引入了 66 0F EB/r por xmm,xmm/mem
至少部分是为了与 MMX 0F EB/r por mm, mm/mem
保持一致,同样的操作码带有新的强制前缀.就像paddb mm, mm
vs. paddb xmm, xmm
.
The reason for SSE2 also introducing 66 0F EB /r por xmm,xmm/mem
is at least partly for consistency with MMX 0F EB /r por mm, mm/mem
, again same opcode with a new mandatory prefix. Just like paddb mm, mm
vs. paddb xmm, xmm
.
但也适用于 vec-integer 与 FP 的不同旁路转发域的可能性.不同的微架构在实际解码和运行这些不同指令方面有不同的行为.有些人以相同的方式运行所有 XMM 或
指令,为 FP 和 simd-integer 域之间的转发创造了额外的延迟.
But also for the possibility of different bypass-forwarding domains for vec-integer vs. FP. Different microarchitectures have had different behaviours for how they actually decoded and ran those different instructions. Some ran all the XMM or
instructions the same way, creating extra latency for forwarding between FP and simd-integer domains.
实际上没有 CPU 对 FP-float 和 FP-double 有不同的转发域,所以是的,movapd
和 orpd
实际上是无用的浪费你永远不应该使用的空间.改用较小的 orps
编码.
No CPUs have ever actually had different fowarding domains for FP-float vs. FP-double, so yes, movapd
and orpd
are in practice useless wastes of space that you should never use. Use the smaller orps
encoding instead.
(或者使用 VEX 编码没关系;vorps
和 vorpd
的大小相同:2 字节前缀 + 操作码 + modrm ...)
(Or with VEX encoding it doesn't matter; vorps
and vorpd
are the same size: 2 byte prefix + opcode + modrm ...)
有关在addps
等FP 数学指令之间使用por
或在 等SIMD 整数insn 之间使用
,见orps
时绕过延迟的更多信息paddb
For more about bypass delay when using por
between FP math instructions like addps
, or orps
between SIMD-integer insns like paddb
, see
- 混合 SSE 整数/浮点 SIMD 指令时是否会导致性能下降
- 逻辑 SSE 内在函数之间有什么区别?
- AVX 指令 vxorpd 和 vpxor 之间的区别
- 混合使用 pxor 和 xorps 会影响性能吗?
- 有没有使用 MOVDQU 和 MOVUPD 优于 MOVUPS 的情况?
- 在混合上下文中选择 SSE 指令执行域 -Skylake 之前的整数版本具有更好的吞吐量.
- Do I get a performance penalty when mixing SSE integer/float SIMD instructions
- What's the difference between logical SSE intrinsics?
- Difference between the AVX instructions vxorpd and vpxor
- Does using mix of pxor and xorps affect performance?
- Is there any situation where using MOVDQU and MOVUPD is better than MOVUPS?
- Choosing SSE instruction execution domains in mixed contexts - pre-Skylake, integer versions have better throughput.
以防万一有人想知道,标题的另一种解释的答案:FP 值上的按位布尔值主要用于设置、清除或切换符号位.或者用 cmpps/pd
蒙版做一些事情,比如混合.
And in case anyone was wondering, the answer to the other interpretation of the title: bitwise booleans on FP values are mostly used to set, clear, or toggle the sign bit. Or to do stuff with cmpps/pd
masks like blending.
这篇关于orpd等SSE2指令的意义何在?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!