x86中未使用或使用最少的MOV指令,可用于自定义MOV扩展 [英] which MOV instructions in the x86 are not used or the least used, and can be used for a custom MOV extension

查看:76
本文介绍了x86中未使用或使用最少的MOV指令,可用于自定义MOV扩展的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在gem5模拟器中的X86体系结构中对自定义MOV指令进行建模,以测试其在模拟器上的实现,我需要使用内联汇编来编译我的C代码以创建一个二进制文件.但是由于它是GCC编译器中尚未实现的自定义指令,因此编译器将抛出错误.我知道一种方法是扩展GCC编译器以接受我的自定义X86指令,但是我不想这样做,因为它比较耗时(但是以后会这样做).

I am modelling a custom MOV instruction in the X86 architecture in the gem5 simulator, to test its implementation on the simulator, I need to compile my C code using inline assembly to create a binary file. But since it a custom instruction which has not been implemented in the GCC compiler, the compiler will throw out an error. I know one way is to extend the GCC compiler to accept my custom X86 instruction, but I do not want to do it as it is more time consuming(but will do it afterwards).

作为一个临时技巧(只是检查我的实现是否值得).我想在模拟器中更改其基础的微操作"时编辑已经是MOV的指令,以欺骗GCC接受我的自定义"指令并进行编译.

As a temporary hack (just to check if my implementation is worth it or not). I want to edit an already MOV instruction while changing its underlying "micro ops" in the simulator so as to trick the GCC to accept my "custom" instruction and compile.

由于它们是x86体系结构中可用的许多类型的MOV指令.由于它们是86体系结构中的各种MOV指令,因此参考.

As they are many types of MOV instructions which are available in the x86 architecture. As they are various MOV Instructions in the 86 architecture reference.

因此,我要问的是,哪种MOV指令使用最少,并且我可以编辑其底层微操作.假设我的工作量仅包含整数,即很可能不会使用xmm和mmx寄存器,并且我的指令镜像了MOV指令的相同实现.

Therefore coming to my question, which MOV instruction is the least used and that I can edit its underlying micro-ops. Assuming my workload just includes integers i.e. most probably wont be using the xmm and mmx registers and my instructions mirrors the same implementation of a MOV instruction.

推荐答案

您最好的选择是常规 mov ,其前缀是GCC永远不会自己发出的.即创建一个新的 mov 编码,该编码在任何其他 mov 的前面都包含一个强制性前缀.

Your best bet is regular mov with a prefix that GCC will never emit on its own. i.e. create a new mov encoding that includes a mandatory prefix in front of any other mov.

或者,如果您要修改GCC和 as ,则可以添加一个新的助记符,该助记符仅将否则无效的(在64位模式下)单字节操作码用于存储源,存储目标,以及 mov 的直接来源版本.AMD64释放了一些操作码,包括AAM之类的BCD指令,以及压入/弹出大多数段寄存器.(您仍然可以从Sregs mov 到/从Sregs中移动,但是每个Sregs不会浪费1个操作码.)

Or if you're modifying GCC and as, you can add a new mnemonic that just uses otherwise-invalid (in 64-bit mode) single byte opcodes for memory-source, memory-dest, and immediate-source versions of mov. AMD64 freed up several opcodes, including the BCD instructions like AAM, and push/pop most segment registers. (You can still mov to/from Sregs, but those don't waste 1 opcodes per Sreg.)

假设我的工作量仅包含整数,即很可能不会使用xmm和mmx寄存器

Assuming my workload just includes integers i.e. most probably wont be using the xmm and mmx registers

对XMM的错误假设:GCC积极使用16字节的 movaps / movups ,而不是一次复制4或8个字节的结构.在标量整数代码中找到向量mov指令作为小型已知长度 memcpy 或struct/array init的内联扩展的一部分,这一点也不罕见.另外,这些 mov 指令至少具有2个字节的操作码(SSE1 0F 28 movaps ,因此普通 mov 前面的前缀与您的想法的大小相同).

Bad assumption for XMM: GCC aggressively uses 16-byte movaps / movups instead of copying structs 4 or 8 bytes at a time. It's not at all rare to find vector mov instructions in scalar integer code as part of inline expansion of small known-length memcpy or struct / array init. Also, those mov instructions have at least 2-byte opcodes (SSE1 0F 28 movaps, so a prefix in front of plain mov is the same size as your idea would have been).

但是,您对MMX规则的看法是正确的.我认为,除非您使用MMX内在函数,否则现代GCC不会发出 movq mm0,mm1 或完全不使用MMX.定位64位代码时绝对不会.

However, you're right about MMX regs. I don't think modern GCC will ever emit movq mm0, mm1 or use MMX at all, unless you use MMX intrinsics. Definitely not when targeting 64-bit code.

也可以从控制注册表中移动 mov ( 0f21/23/r )或调试寄存器( 0f20/22/r )都是 mov 助记符,但是gcc绝对不会单独发出.仅当GP寄存器操作数作为不是调试或控制寄存器的操作数时才可用.因此从技术上讲,这是标题问题的答案,但可能不是您真正想要的.

Also mov to/from control regs (0f 21/23 /r) or debug registers (0f 20/22 /r) are both the mov mnemonic, but gcc will definitely never emit either on its own. Only available with GP register operands as the operand that isn't the debug or control register. So that's technically the answer to your title question, but probably not what you actually want.

GCC不会解析其内联asm模板字符串,它只是将其包含在其asm文本输出中,以代替%number 个操作数后送入汇编器.因此,GCC本身并不构成使用嵌入式asm发出任意asm文本的障碍.

GCC doesn't parse its inline asm template string, it just includes it in its asm text output to feed to the assembler after substituting for %number operands. So GCC itself is not an obstacle to emitting arbitrary asm text using inline asm.

您可以使用 .byte 发出任意机器代码.

And you can use .byte to emit arbitrary machine code.

也许一个不错的选择是使用 0E 字节作为特殊的 mov 编码的前缀,而您要对GEM进行特殊的解码.在32位模式下, 0E push CS ,在64位模式下无效.GCC永远也不会发射.

Perhaps a good option would be to use a 0E byte as a prefix for your special mov encoding that you're going to make GEM decode specially. 0E is push CS in 32-bit mode, invalid in 64-bit mode. GCC will never emit either.

或者只是一个F2 repne 前缀;GCC绝不会在 mov 操作码(不适用于此操作)的前面发出 repne ,而只会发出 movs .(F3 rep / repe 在用于内存目标指令时表示xrelease,因此请不要使用它.

Or just an F2 repne prefix; GCC will never emit repne in front of a mov opcode (where it doesn't apply), only movs. (F3 rep / repe means xrelease when used on a memory-destination instruction so don't use that. https://www.felixcloutier.com/x86/xacquire:xrelease says that F2 repne is the xacquire prefix when used with locked instructions, which doesn't include mov to memory so it will be silently ignored there.)

像往常一样,不适用的前缀没有记录的行为,但是在实践中,不理解 rep / repne 的CPU会忽略它.将来的某些CPU可能会理解它的特殊含义,而这正是您使用GEM所做的.

As usual, prefixes that don't apply have no documented behaviour, but in practice CPUs that don't understand a rep / repne ignore it. Some future CPU might understand it to mean something special, and that's exactly what you're doing with GEM.

选择 .byte 0x0e; 而不是 repne; 可能是一个更好的选择,如果您想防止意外地将这些前缀保留在运行的版本中真正的CPU .(它将在64位模式下#UD-> SIGILL,否则通常会因在32位模式下弄乱堆栈而崩溃.)但是,如果您 do 希望能够运行完全相同的二进制文件在具有相同代码对齐方式和所有内容的真实CPU上,那么理想的是忽略REP前缀.

Picking .byte 0x0e; instead of repne; might be a better choice if you want to guard against accidentally leaving these prefixes in a build you run on a real CPU. (It will #UD -> SIGILL in 64-bit mode, or usually crash from messing up the stack in 32-bit mode.) But if you do want to be able to run the exact same binary on a real CPU, with the same code alignment and everything, then an ignored REP prefix is ideal.

在标准 mov 指令之前使用前缀具有使汇编程序为您对操作数进行编码的优点:

Using a prefix in front of a standard mov instruction has the advantage of letting the assembler encode the operands for you:

template<class T>
void fancymov(T& dst, T src) {
    // fixme: imm -> mem  needs a size suffix, defeating template
    // unless you use Intel-syntax where the operand includes "dword ptr"
    asm("repne; movl  %1, %0"
#if 1
       : "=m"(dst)
       : "ri" (src)
#else
       : "=g,r"(dst)
       : "ri,rmi" (src)
#endif
       : // no clobbers
    );
}

void test(int *dst, long src) {
    fancymov(*dst, (int)src);
    fancymov(dst[1], 123);
}

(多重替代约束使编译器可以选择reg/mem目标或reg/mem源.实际上,它更喜欢寄存器目标,即使这将使它花费另一条指令进行自己的存储,也很糟糕.)

(Multi-alternative constraints let the compiler pick either reg/mem destination or reg/mem source. In practice it prefers the register destination even when that will cost it another instruction to do its own store, so that sucks.)

在Godbolt编译探险,对于版本只允许一个内存目标:

On the Godbolt compiler explorer, for the version that only allows a memory-destination:

test(int*, long):
        repne; movl  %esi, (%rdi)       # F2 E9 37
        repne; movl  $123, 4(%rdi)      # F2 C7 47 04 7B 00 00 00
        ret

如果您希望此功能可用于加载,我认为您必须制作该函数的2个独立版本,并在适当的情况下手动使用加载版本或存储版本,因为GCC似乎要使用reg,reg只要有可能.

If you wanted this to be usable for loads, I think you'd have to make 2 separate versions of the function and use the load version or store version manually, where appropriate, because GCC seems to want to use reg,reg whenever it can.

或者使用允许寄存器输出的版本(或另一个将结果作为 T 返回的版本,请参见Godbolt链接):

Or with the version allowing register outputs (or another version that returns the result as a T, see the Godbolt link):

test2(int*, long):
        repne; mov  %esi, %esi
        repne; mov  $123, %eax
        movl    %esi, (%rdi)
        movl    %eax, 4(%rdi)
        ret

这篇关于x86中未使用或使用最少的MOV指令,可用于自定义MOV扩展的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆