查询旧版3DNow!指令系统 [英] Query about legacy 3DNow! instruction set

查看:143
本文介绍了查询旧版3DNow!指令系统的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

只是为了好玩,我正在查看 3DNow的旧版(不推荐使用)说明!设置,我正在尝试了解它们是如何使用的.所有指令似乎都按照这种模式进行编码:

Just for fun I'm reviewing legacy (deprecated) instructions from 3DNow! set introduced by AMD, and I'm trying to understand how were they used. All instructions seem to be encoded following this pattern:

instruction destination_MMn_register_operand, source_MMn_register_or_memory_operand

其中destinationRegister = destinationRegister-操作-source

例如pfadd mm0, mmword ptr [rcx](0F 0F 01 9E):

rcx指向的内存中的2个压缩浮点加到mm0中存储的2个压缩浮点中,并将结果保存在mm0中.

Would add 2 packed floats from memory pointed by rcx to 2 packed floats stored in mm0 and keep result in mm0.

因此,似乎这些3DNow指令始终将mm寄存器作为目标.

So it seems like those 3DNow instructions always have an mm register as a destination.

但是您应该如何从那些mm寄存器中获取结果?

But how were you supposed to get the results out of those mm registers?

换句话说,没有mov mmword ptr [rcx], mm0mov rax, mm0指令.

In other words, there's no mov mmword ptr [rcx], mm0, or mov rax, mm0 instructions.

推荐答案

正如@harold所说,存储到内存已被MMX movdpshufw + movd提取,仅提取高float

As @harold says, storing to memory is already covered by MMX movd, or pshufw+movd to extract just the high float.

您不能做的一件事就是打开3dNow!浮动到x87 80位浮动中,而无需存储/重新加载.

The one thing you can't do is turn an 3dNow! float into an x87 80-bit float without a store/reload.

可能有用的是EMMS版本,它将32位float扩展为st0中的80位x87 long double,并将FPU设置回x87模式而不是MMX模式 1 .也许甚至将多个mm寄存器转换为多个x87寄存器也可以这样做?

What might have been potentially useful is a version of EMMS that expands a 32-bit float into an 80-bit x87 long double in st0, along with setting the FPU back into x87 mode instead of MMX mode1. Or maybe even do that for multiple mm registers into multiple x87 registers?

即在简化SIMD之后,movd dword [esp], mm0/emms/fld dword [esp]设置进一步的标量FP的快捷方式.

i.e. it would be a shortcut for movd dword [esp], mm0 / emms / fld dword [esp] to set up for further scalar FP after a SIMD reduction.

请记住,这些是IEEE754 float;您通常不希望它们在整数寄存器中,除非您将它们的位字段分开(例如,对于explog实现),但是您可以使用MMX移位/掩码指令来做到这一点.

Remember that these are IEEE754 floats; you normally don't want them in integer registers unless you're picking apart their bit-fields (e.g. for an exp or log implementation), but you can do that with MMX shift/mask instructions.

但是movd和fld很便宜,因此他们不必费心做一个专门的指令来节省重载延迟.同样,将其作为单个指令来实现可能会很慢.即使x86不是RISC ISA,拥有一个真正复杂的指令通常也比多个简单的指令慢(特别是在解码为多个微指令之前完全是一件事情).英特尔和AMD的sysentersyscall指令来代替int 0x80来进行系统调用,在保存更多状态之前/之后需要附加的指令,但总体上还是更快.

But movd and fld are cheap, so they didn't bother making a special instruction just to save the reload latency. Also, it might have been slow to implement as a single instruction. Even though x86 is not a RISC ISA, having one really complex instruction is often slower than multiple simpler instructions (especially before decoding to multiple uops was fully a thing.) e.g. Intel and AMD's sysenter and syscall instructions to replace int 0x80 for system calls require additional instructions before/after to save more state, but are still overall faster.

3dNow!的 femms离开了MMX/3dNow!寄存器内容未定义,仅将标记字设置为未使用,而不保留从MMX寄存器到x87寄存器内容的映射.请参阅 http://refspecs.linuxbase.org/AMD-3Dnow.pdf AMD官方手册. IDK如果AMD的微体系结构只是放弃了寄存器重命名信息或其他内容,但可能以快速的方式存储/femms/x87-load可以节省很多晶体管.

3dNow!'s femms leaves the MMX/3dNow! register contents undefined, only setting the tag words to unused instead of preserving the mapping from MMX registers to/from x87 register contents. See http://refspecs.linuxbase.org/AMD-3Dnow.pdf for an official AMD manual. IDK if AMD's microarchitectures just dropped the register-renaming info or what, but probably making store / femms / x87-load the fast way saves a lot of transistors.

甚至FEMMS仍然有些慢,因此他们不想鼓励编码人员离开/重新进入MMX/3dNow!模式.

Or even FEMMS is still somewhat slow, so they don't want to encourage coders to leave/re-enter MMX/3dNow! mode at all often.

有趣的事实:3dNow!仍然使用PREFETCHW(具有写意图的预取),并且具有自己的CPUID功能位.

Fun fact: 3dNow! PREFETCHW (prefetch with write intent) is still used, and has its own CPUID feature bit.

中查看我的答案_builtin_prefetch()中第二个参数的作用是什么?

Intel CPU很快增加了对将其解码为NOP的支持(因此,诸如64位Windows之类的软件无需检查即可使用它),但是Broadwell和更高版本实际上是通过RFO预取的,以将缓存行设置为MESI Exclusive状态,而不是已共享,因此无需额外的核心外流量就可以切换到已修改".

Intel CPUs soon added support for decoding it as a NOP (so software like 64-bit Windows can use it without checking), but Broadwell and later actually prefetch with a RFO to get the cache line in MESI Exclusive state, rather than Shared, so it can flip to Modified without additional off-core traffic.

CPUID功能位指示它确实会预取.

The CPUID feature bit indicates that it really will prefetch.

脚注1 :

请记住,MMX寄存器是x87寄存器的别名,因此不需要新的OS支持就可以在上下文切换器上保存/恢复体系结构状态.直到 SSE ,我们才有了新的架构状态.所以直到SSE2 + 3dNow! 3dNow!将float转换为SSE2 double可能会有意义,而无需切换回x87模式.您可以 movq2dq xmm0, mm0 +

Remember that the MMX registers alias the x87 registers, so no new OS support was needed to save/restore architectural state on context switches. It wasn't until SSE that we got new architectural state. So it wasn't until SSE2+3dNow! that a 3dNow! float to SSE2 double could make sense without switching back to x87 mode. And you could movq2dq xmm0, mm0 + cvtps2pd xmm0, xmm0.

他们可能在mm寄存器中有一个float-> double,但是fld/fst硬件仅设计用于floatdouble-> 80位和80位- > floatdouble.而且用例是有限的;如果您使用的是3dNow !,请坚持使用float.

They could have had a float->double in a mm register, but the fld / fst hardware was only designed for float or double->80-bit and 80-bit->float or double. And the use-case for that is limited; if you're using 3dNow!, just stick to float.

这篇关于查询旧版3DNow!指令系统的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆