汇编的ASM代码能否以多种可能的方式产生结果(偏移值除外)? [英] Can assembled ASM code result in more than a single possible way (except for offset values)?

查看:116
本文介绍了汇编的ASM代码能否以多种可能的方式产生结果(偏移值除外)?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对x86 ASM不太了解,但是我对SHARP-z80很满意,而且我从经验中知道每条指令(助记符)都有对应的字节/字值,并通过查看十六进制转储汇编的二进制文件中,我可以读回"我使用助记符编写的相同代码.

在另一个 SO问题中,有人声称​​在某些情况下,ASM指令未翻译成它们相应的二进制值,但是由汇编程序以不同的方式重新排列.

我正在寻找拆卸二进制文件会导致与原始代码不同的ASM代码的情况.

换句话说,在任何情况下汇编代码与汇编代码不是1:1的比例吗?

解决方案

在汇编设计者认为有用的程度上,它可以替代具有其他有用属性的等效指令.

首先,存在具有可变长度值操作数字段的机器.如果值/偏移量可以适合多个变体中的任何一个,则汇编程序通常会用最短的值/偏移量进行替换. (在这样的汇编器中,通常可以强制使用特定大小).对于涉及立即数操作数和索引寻址的指令,这是正确的.

许多机器都具有PC相对偏移量的指令,通常用于JMP,有时用于加载/存储/算术指令.汇编器在第一次通过时遇到这样的指令时,可以确定所指示的操作数是在指令之前,还是尚未看到该指令.如果在前面,则汇编程序可以选择短相对形式或长相对形式,因为它知道偏移量.如果遵循,汇编器将不知道大小,通常会为它在pass2期间填写的指令选择较大的偏移量.同样,倾向于强制汇编器选择短格式.

某些机器没有跳远的相关说明.在这种情况下,如果目标在jmp之前并且在jmp附近,则汇编器将向后插入一个简短的jmp相对.如果目标在前面但又很远,或者目标是前向引用,则汇编器可以在相反的分支条件上插入short-relative-jmp,目标在下一条指令之后,然后是长的绝对jmp. (我亲自构建了这样的汇编程序).这样可以确保jmp始终可以达到其目标.

关于这些技巧的好消息是,如果您反汇编,仍然可以获得有效的汇编程序.

现在,让我们来解决那些会使您的反汇编程序感到困惑的问题.

如果机器对加载/存储指令具有较短的相对寻址,并且程序员显然指定了常量或值的加载距离很远,则可以使用类似的技巧来相对于文字操作数进行跳转.在这种情况下,汇编器将指令更改为引用文字或地址常量,然后在该常量周围插入一个简短的相对jmp.反汇编器认为指令流中的所有内容都是指令;在这种情况下,字面值不是,这将使反汇编程序退出.至少在文字周围有一个无条件的jmp来指导反汇编程序.

您可以在成熟的汇编器中找到更多的技巧,其中支持了每一个想象中的特技.在8位汇编器上,我最喜欢的工具之一是伪"指令SKIP1,SKIP2,您可以将它们视为相对较短的相对分支.它们实际上只是"CMP#8bits"和"CMP#16bits"指令的Opcoode字节,分别用于跳转8位或16位指令.因此,一个字节"的相对跳转而不是两个.当您被压缩时,每个字节都很重要:-{

      SKIP1
      INC    ; 8 bit instruction
      ...

这在尝试实现不应在循环条目上执行某些步骤但需要在进一步的循环迭代中执行的步骤的循环时也很方便:

      SKIP2
LOOP: SHLD  ; 16 bit instruction
      ...
      BNE LOOP

这里的问题是,如果您反汇编SKIP1或SKIP2指令,将看不到INC(或相应的16位指令).

汇编语言程序员用于传递参数的一个技巧是在调用后将它们放在行内,但前提是所调用的例程会适当地调整返回地址:

      CALL   foo
      DC     param1
      DC     param2

或 呼叫列印 DC可变长度字符串",0

反汇编程序没有实际的方式可以知道正在使用这样的约定或该约定是什么,因此反汇编程序势必会处理此错误.

I don't know x86 ASM very well, but I'm rather comfortable with SHARP-z80, and I know by experience that each instruction (mnemonic) has a corresponding byte/word value, and by looking at the hex dump of the assembled binary file I can "read back" the same code I wrote using mnemonics.

In another SO question, somebody claimed that there are some situations where ASM instructions are not translated to their corresponding binary value, but instead are rearranged in a different way by the assembler.

I'm looking especially for cases where disassembling the binary would result in a different ASM code than the original one.

In other words, are there any cases where assembly code is not 1:1 ratio with assembled code?

MikeKwan linked to another question where GCC would modify inline ASM code (in a C project), but, even though that's an interesting topic, it doesn't answer to this question, because GCC is a compiler, and always tries to optimize code and inline ASM trnslation is affected by surrounding C code.

解决方案

To the extent that the assembler designers think it was helpful, it may substitute equivalent instructions that have other, useful properties.

First, there machines with variable length value operands fields. If a value/offset will fit into any of several variants, it is common for the assembler to substitute the shortest. (In such assemblers, it is also common to be able force a particular size). This is true of instructions that involved immediate operands and indexed addressing.

Many machines have instructions with PC-relative offsets, commonly for JMPs, sometimes for load/store/arithmetic instructions. An assembler on encountering such an instruction during the first pass can determine of the addressed operand precedes the insruction or it has not seen the instruction yet. If preceding, the assembler can choose a short relative form or a long relative form because it knows the offset. If following, the assembler doesn't know the size, and generally chooses a large offset for the instruction that it fills in during pass2. Similarly, there tend to be ways to force the assembler to choose the short form.

Some machines don't have long jump relative instructions. In this case, the assembler will insert a short jmp relative backwards if the the target precedes the jmp and is close by. If the target precedes but is far away, or the target is a forward reference, the assembler may insert a short-relative-jmp on the opposite branch conditions with target being past the next instruction, followed by a long absolute jmp. (I've personally built assemblers like this). This ensures that jmps can always get to their target.

The good news about these tricks is that if you disassemble, you still get a valid assembly program.

Now lets turn to ones that will confuse your disassembler.

A similar trick to jump relative for literal operands may be used if the machine has short-relative addressing for load/store instructions and the programmer apparently specifies loading of a constant or value a long way away. In this case the assembler changes the instuction to refer to a literal or an address constant following an inserted short relative jmp around that constant. The dissembler thinks everything in the instruction stream is an instruction; in this case, the literal value is not and that would throw the disassembler off. At least there's an unconditional jmp around the literal to guide the disassembler.

Screwier tricks you may find in mature assemblers where every stunt ever imagined is supported. One of my favorites on an 8 bit assemblers were "pseudo" instuctions SKIP1, SKIP2, which you can think of as extremely short relative branches. They were really just the opcoode byte of "CMP #8bits" and "CMP #16bits" instructions, and were used to jump around an 8 bit or 16 bit instruction respectively. So, a "one byte" relative jump rather than two. When you're squeezed for space, every byte counts :-{

      SKIP1
      INC    ; 8 bit instruction
      ...

This was also handy when trying to implement a loop where some step shouldn't be performed on loop entry, but needs to be done on further loop iterations:

      SKIP2
LOOP: SHLD  ; 16 bit instruction
      ...
      BNE LOOP

This issue here is that if you disassemble the SKIP1 or SKIP2 instructions, you won't see the INC (or the corresponding 16 bit instruction).

A trick used by assembly language programmers for passing parameters is to place them inline after the call, with the proviso that the called routine adjust the return address appropriately:

      CALL   foo
      DC     param1
      DC     param2

Or CALL printstring DC "a variable length string",0

There is no practical way that a disassembler can know that such a convention is being used or what that convention is, so the dissembler is bound to handle this wrong.

这篇关于汇编的ASM代码能否以多种可能的方式产生结果(偏移值除外)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆