设计AT& T汇编语法的最初原因是什么? [英] What was the original reason for the design of AT&T assembly syntax?

查看:77
本文介绍了设计AT& T汇编语法的最初原因是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在x86或amd64上使用汇编指令时,程序员可以使用"Intel"(即nasm编译器)或"AT& T"(即gas编译器)汇编语法.在Windows上"Intel"语法更流行,而在UNIX(类似)系统上"AT& T"更流行.

When using assembly instructions on x86 or amd64, programmer can use "Intel" (i.e. nasm compiler) or "AT&T" (i.e. gas compiler) assembly syntax. "Intel" syntax is more popular on Windows, but "AT&T" is more popular on UNIX(-like) systems.

但是Intel和AMD手册,都是由芯片创建者创建的手册,都使用"Intel"语法.

But both Intel and AMD manuals, so manuals created by the creators of the chip, are both using the "Intel" syntax.

我想知道,"AT& T"语法设计的初衷是什么?摆脱处理器创建者使用的符号有什么好处?

I'm wondering, what was the original idea behind the design of the "AT&T" syntax? What was the benefit for floating away from notation used by the creators of the processor?

推荐答案

UNIX长期以来是在PDP-11上开发的,PDP-11是DEC的16位计算机,具有相当简单的指令集.几乎每个指令都有两个操作数,每个操作数可以具有以下八个寻址模式之一,此处以MACRO 16汇编语言显示:

UNIX was for a long time developed on the PDP-11, a 16 bit computer from DEC, which had a fairly simple instruction set. Nearly every instruction has two operands, each of which can have one of the following eight addressing modes, here shown in the MACRO 16 assembly language:

0n  Rn        register
1n  (Rn)      deferred
2n  (Rn)+     autoincrement
3n  @(Rn)+    autoincrement deferred
4n  -(Rn)     autodecrement
5n  @-(Rn)    autodecrement deferred
6n  X(Rn)     index
7n  @X(Rn)    index deferred

可以通过巧妙地在程序计数器R7上重新使用某些寻址模式来对直接地址和直接地址进行编码:

Immediates and direct addresses can be encoded by cleverly re-using some addressing modes on R7, the program counter:

27  #imm      immediate
37  @#imm     absolute
67  addr      relative
77  @addr     relative deferred

由于UNIX tty驱动程序使用@#作为控制字符,因此$代替了#,而*代替了@.

As the UNIX tty driver used @ and # as control characters, $ was substituted for # and * for @.

PDP11指令字中的第一个操作数引用源操作数,而第二个操作数引用目的地.这反映在汇编语言的操作数顺序中,该顺序是源,然后是目标.例如,操作码

The first operand in a PDP11 instruction word refers to the source operand while the second operand refers to the destination. This is reflected in the assembly language's operand order which is source, then destination. For example, the opcode

011273

参考说明

mov (R2),R3

会将R2指向的单词移到R3.

此语法适用于8086 CPU及其寻址模式:

This syntax was adapted to the 8086 CPU and its addressing modes:

mr0 X(bx,si)  bx + si indexed
mr1 X(bx,di)  bx + di indexed
mr2 X(bp,si)  bp + si indexed
mr3 X(bp,di)  bp + di indexed
mr4 X(si)     si indexed
mr5 X(di)     di indexed
mr6 X(bp)     bp indexed
mr7 X(bx)     bx indexed
3rR R         register
0r6 addr      direct

如果没有索引,则m为0,如果有一个字节索引,则m为1,如果有两个字节索引,则m为2,如果没有索引,则为m为3对于一个存储器操作数,使用一个寄存器.如果存在两个操作数,则另一个操作数始终是一个寄存器,并以r数字进行编码.否则,r会对操作码的另外三位进行编码.

Where m is 0 if there is no index, m is 1 if there is a one-byte index, m is 2 if there is a two-byte index and m is 3 if instead of a memory operand, a register is used. If two operands exist, the other operand is always a register and encoded in the r digit. Otherwise, r encodes another three bits of the opcode.

在这种寻址方案中不可能立即执行,所有采用立即数的指令都在其操作码中对该事实进行编码.就像PDP-11语法一样,立即数的拼写方式为$imm.

Immediates aren't possible in this addressing scheme, all instructions that take immediates encode that fact in their opcode. Immediates are spelled $imm just like in the PDP-11 syntax.

尽管英特尔一直对其汇编程序使用dst, src操作数排序,但并没有特别令人信服的理由来适应此约定,并且编写了UNIX汇编程序以使用PDP11中已知的src, dst操作数排序.

While Intel always used a dst, src operand ordering for its assembler, there was no particularly compelling reason to adapt this convention and the UNIX assembler was written to use the src, dst operand ordering known from the PDP11.

他们在执行8087浮点指令时这种顺序有些不一致,可能是因为Intel给非交换浮点指令的两个可能方向提供了不同的助记符,这些助记符与AT& T语法使用的操作数顺序不匹配

They made some inconsistencies with this ordering in their implementation of the 8087 floating point instructions, possibly because Intel gave the two possible directions of non-commutative floating point instructions different mnemonics which do not match the operand ordering used by AT&T's syntax.

PDP11指令jmp(跳转)和jsr(跳转到子程序)跳转到其操作数的地址.因此,jmp foo会跳转到foo,而jmp *foo会跳转到存储在变量foo中的地址,类似于lea在8086中的工作方式.

The PDP11 instructions jmp (jump) and jsr (jump to subroutine) jump to the address of their operand. Thus, jmp foo would jump to foo and jmp *foo would jump to the address stored in the variable foo, similar to how lea works in the 8086.

x86的jmpcall指令的语法被设计为就像这些指令在PDP11上一样工作,这就是为什么jmp foo跳转到foojmp *foo跳转到地址<上的值的原因. c22>,即使8086实际上并没有延迟的寻址.这样做的优点和便利是,在语法上将直接跳转与间接跳转区分开来,而无需为每个直接跳转目标都添加$前缀,但是从逻辑上讲并没有太多意义.

The syntax for the x86's jmp and call instructions was designed as if these instructions worked like on the PDP11, which is why jmp foo jumps to foo and jmp *foo jumps to the value at address foo, even though the 8086 doesn't actually have deferred addressing. This has the advantage and convenience of syntactically distinguishing direct jumps from indirect jumps without requiring an $ prefix for every direct jump target but doesn't make a lot of sense logically.

语法已扩展为使用冒号指定段前缀:

The syntax was expanded to specify segment prefixes using a colon:

seg:addr

当80386推出时,该方案使用四部分通用寻址模式适应了其新的SIB寻址模式:

When the 80386 was introduced, this scheme was adapted to its new SIB addressing modes using a four-part generic addressing mode:

disp(base,index,scale)

其中,disp是位移,base是基址寄存器,index是索引寄存器,而scale是1、2、4或8,以按这些量之一来缩放索引寄存器.这等于Intel语法:

where disp is a displacement, base is a base register, index an index register and scale is 1, 2, 4, or 8 to scale the index register by one of these amounts. This is equal to Intel syntax:

[disp+base+index*scale]

PDP-11的另一个显着特点是,大多数指令以字节和字变体形式提供.操作码的后缀bw指示使用的是哪个,它直接切换操作码的第一位:

Another remarkable feature of the PDP-11 is that most instructions are available in a byte and a word variant. Which one you use is indicated by a b or w suffix to the opcode, which directly toggles the first bit of the opcode:

 010001   movw r0,r1
 110001   movb r0,r1

这也适用于AT& T语法,因为实际上大多数8086指令也可以字节模式和字模式使用.后来80386和AMD K6引入了32位指令(后缀l表示long)和64位指令(后缀q表示Quad).

this also was adapted for AT&T syntax as most 8086 instructions are indeed also available in a byte mode and a word mode. Later the 80386 and AMD K6 introduced 32 bit instructions (suffixed l for long) and 64 bit instructions (suffixed q for quad).

最后但并非最不重要的一点是,最初的惯例是在C语言符号前加一个下划线(在Windows上仍然如此),因此您可以从寄存器ax中区分出名为ax的C函数. Unix系统实验室开发ELF二进制格式时,他们决定摆脱这种修饰.由于无法将直接地址与寄存器区分开,因此在每个寄存器中都添加了%前缀:

Last but not least, originally the convention was to prefix C language symbols with an underscore (as is still done on Windows) so you can distinguish a C function named ax from the register ax. When Unix System Laboratories developed the ELF binary format, they decided to get rid of this decoration. As there is no way to distinguish a direct address from a register otherwise, a % prefix was added to every register:

mov direct,%eax # move memory at direct to %eax

这就是我们今天获得AT& T语法的方式.

And that's how we got today's AT&T syntax.

这篇关于设计AT&amp; T汇编语法的最初原因是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆