如何移动 eax 寄存器中的值,ah 和 al 左 2 个字节?x86 组装 [英] How to mov values in eax register, ah and al left by 2 bytes? x86 Assembly

查看:48
本文介绍了如何移动 eax 寄存器中的值,ah 和 al 左 2 个字节?x86 组装的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个关于如何在 x86 汇编 eax 寄存器中移动值的问题.我知道 32 位寄存器分解为较小的组件寄存器,低 16 位是 ax,而 16 位进一步分解为 8 位寄存器 ah 和 al.

I have a question about how to move values around in the x86 Assembly eax register. I know that the 32-bit register breaks down into smaller component registers with the lower 16-bits being ax and that 16- bits breaks down even further into the 8-bit registers ah and al.

我目前正在为 x86 汇编语言作业编写一个程序,该程序希望我仅使用 mov、add 和 sub 命令在寄存器中移动四个 8 位十六进制值.该程序首先让您通过加减来移动变量的值,这没问题.

I'm currently writing a program for an x86 Assembly Language assignment that wants me to move four 8-bit hex values around in the register using only the mov, add, and sub commands. The program starts by having you shift the values of the variables around by adding and subtracting them, and that's no problem.

第二部分 (phase2) 是将每个值放入每个 eax 8 位位置.但是,我知道您只能访问较低的两个 8 位位置(ah"和al".)我需要以某种方式将 ah 和 al 一起移动到 eax 的前导 16 位中,将添加到ah 和 al 留下了两个字节的位置?(问号,因为我不知道.)我相当确定我可以将正确的值添加回 ah 和 al 以完成解决方案.

The second part (phase2) is to put each of the values into each of the eax 8-bit positions. But, I know you can only access the lower two 8-bit positions ("ah" and "al".) I need to somehow move ah and al together into that leading 16-bits of eax, pushing the values added to ah and al left two-byte positions? (question mark, because I do not know.) I am fairly certain that I can then just add the correct values back to ah and al to finish the solution.

我相信这样做的方法是向 ah 添加一些十六进制值"并保留溢出,但我似乎无法理解它的逻辑.逻辑上"我想说这似乎是最好的行动方案,但我不确定如何实施.而且,由于我无法解决它,所以我找不到我应该找到的隐藏算法.Phase2 应该只有 aprx 21 行,所以我知道它不是大量的添加指令列.

I believe the way to do this is to add 'some hex value' to ah and have that overflow left, but I can't seem to wrap my head around the logic of it. "Logically," I want to say this seems like the best course of action, but I'm not sure how to implement it. And, since I can't wrap my head around it, I can't find the hidden algorithm I'm supposed to find. Phase2 is only supposed to be aprx 21 lines so I know it is not a massive column of add instructions.

任何关于如何思考这个问题的方向都将受到高度赞赏.感谢任何人.

Any direction on how to think about this would be highly appreciated. Thanks to whomever.

.386
.model flat,stdcall
.stack 4096
ExitProcess proto,dwExitCode:dword

.data
    var1 BYTE 'A'
    var2 BYTE 'B'
    var3 BYTE 'C'
    var4 BYTE 'D'
    
.code
main proc
;phase1
mov al, var1; store 'A'
mov ah, var4; store 'D'
mov var1, ah; move 'D' to var1
sub ah, 1; make ah 'C'
mov var4, ah; move 'C' to var4
sub ah, 1; make ah 'B'
mov var3, ah; move 'B' to var3
mov var2, al; 'mov al to var2 

    ;var1 BYTE 'D'
    ;var2 BYTE 'A'
    ;var3 BYTE 'B'
    ;var4 BYTE 'C'


;phase2
mov ah, var1; store 'D'
mov al, var2; store 'A'

; this is where I want to shift al and ah left two bytes 
; once the first two bytes of eax equal 'DA' move 'B' 'C' 
; into ah and al

mov ah, var3; store 'B'
mov al, var4; store 'C'

;eax should read 'DABC' = 44414243
    
    invoke ExitProcess,0
main endp
end main

推荐答案

如果你不能像普通人一样使用shl eax, 16,你的其他选择包括:

If you can't use shl eax, 16 like a normal person, your other options include the following:

  • add eax,eax 在部分展开或完全展开的循环中重复 16 次(糟糕,慢).
  • 在偏移处存储/重新加载:也很慢,但仅限于延迟(存储转发停顿).吞吐量还可以,而延迟与典型现代 x86 上的 16x add 方式非常接近相同的 16 个周期.
  • add eax,eax repeated 16 times (yuck, slow), in a loop partially unrolled, or fully unrolled.
  • store / reload at an offset: also slow, but only for latency (store-forwarding stall). Throughput can be ok, while latency is pretty close to the same 16 cycles as the 16x add way on a typical modern x86.
    sub  esp, 16             ; reserve some stack space.

    ...
    mov  [esp+2], ax         ; 2 byte store
    mov  eax, [esp]          ; 4-byte reload with previous AX in the top half
    
    mov  ah, ...             ; overwrite whatever garbage in the low 2 bytes
    mov  al, ...

x86 是小端,因此将 EAX 加载/存储到 addr 将 AL 加载/存储到相同的 addr,而 AH 到addr+1.,高2字节来自addr+2和+3.

x86 is little-endian, so load/store of EAX to addr loads/stores AL to that same addr, and AH to addr+1., with the high 2 bytes coming from addr+2 and +3.

在写入 AH 和 AL 后读取 EAX 也将强制 CPU 合并部分寄存器,如果它与完整 EAX 分开重命名 AH(也可能是 AL),但很明显,如果您将自己限制在 ISA 的一小部分那么高性能不是您的首要目标.(参见 为什么 GCC 不使用部分寄存器?Haswell/Skylake 上的部分寄存器究竟如何执行? 写AL好像对RAX有错误的依赖,AH不一致更多细节.)

Reading EAX after writing AH and AL will also force the CPU to merge partial registers if it renamed AH (and maybe AL) separately from the full EAX, but clearly if you're restricting yourself to only a tiny subset of the ISA then high performance isn't your top goal. (See Why doesn't GCC use partial registers? and How exactly do partial registers on Haswell/Skylake perform? Writing AL seems to have a false dependency on RAX, and AH is inconsistent for more details.)

关于 store-forwarding 的部分,参见 现代 x86 实现是否可以从多个先前的商店进行存储转发?

For the store-forwarding stall part, see Can modern x86 implementations store-forward from more than one prior store?

根据你对新的低部分(新的 AH 和 AL)做了多少,你实际上可能在一个单独的寄存器中做它们(比如 DH 和 DL),所以无序的 exec 可以开始那工作,没有对存储转发重新加载的错误依赖,特别是在不与EAX分开重命名AL(甚至AH)的CPU上.(即不是 Intel P6 系列的 CPU,如硬皮的老 Nehalem).

Depending how much you're doing with the new low part (the new AH and AL), you might actually do them in a separate register (like DH and DL), so out-of-order exec can get started on that work, without a false dependency on the store-forwarding reload, especially on CPUs that don't rename AL (or even AH) separately from EAX. (i.e. CPUs that aren't Intel P6 family, like crusty old Nehalem).

所以你会这样做

    mov  [esp+2], ax         ; 2 byte store
    mov  eax, [esp]          ; 4-byte reload with previous AX in the top half
    
    mov  dl, ...
    mov  dh, ...
    ... more computation with these two

    mov  ax, dx              ; replace low 2 bytes of EAX

mov ax,dx 可能需要等待旧的 EAX 值准备好",即重新加载完成,因此它可以作为运行该指令的一部分合并到它.(在英特尔 Sandybridge 系列和所有非英特尔 CPU 上.)因此,这让 DL/DH 上的计算与存储转发延迟重叠.

mov ax,dx might need to wait for the old EAX value to be "ready", i.e. for the reload to complete, so it can merge into it as part of running that instruction. (On Intel Sandybridge-family, and on all non-Intel CPUs.) So this lets the computations on DL/DH overlap with the store-forwarding latency.

需要说明的是,所有关于权衡的讨论都是关于性能的,而不是正确性;我在这里展示的所有方法都是完全正确的.(除非我犯了错误:P)

Just to be clear, all this discussion about tradeoffs is about performance, not correctness; all ways I've shown here are fully correct. (unless I made a mistake :P)

这篇关于如何移动 eax 寄存器中的值,ah 和 al 左 2 个字节?x86 组装的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆