MARS MIPS 模拟器的内置汇编器比要求的多吗? [英] MARS MIPS simulator's built-in assembler aligns more than requested?

查看:78
本文介绍了MARS MIPS 模拟器的内置汇编器比要求的多吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下数据段

.dataa: .byte 0x11.align 1b: .word 0x22334455

假设地址a"为 0x10010000,那么 b 处的字的预期地址为 0x10010002,但 MARS 将字存储在 0x10010004,忽略显式的.align"指令.顺便说一句,我使用 MARS MIPS 模拟器(MacBook Pro 上的 4.5 版)来组装上述代码.

因此,我的问题是:这是一个错误,还是预期 MARS 的行为与 SGI 1992 年的 MIPS 汇编语言文档不同,例如本 Pascal/汇编手册的第 8-1 页?

(MARS 和非 MARS MIPS asm 文档同意 MIPS 语法中的 .align 采用 2 的幂参数,因此 .align 1 与 2 对齐^1 = 2 字节边界.不同于其他一些架构的 GAS/Unix 汇编语法,其中 .align = 字节对齐,其中 1 的 arg 将是多余的.)

解决方案

TL:DR:MARS 工具提示具有误导性;您需要使用 .align 0 为该部分的其余部分禁用自动对齐.您不能只是对齐下一个单词.

<小时>

.align 1 确实按 2 对齐,这不是问题.例如在 .byte.ascii 伪指令之间尝试一下.

例如此源生成 0x00110062 作为 .data 部分的第一个字,就像 .byte 'b', 0, 0x11, 0 一样.

.dataa: .ascii "b"乙:.align 1.byte 0x11

并且b:标签有地址2对齐填充之后.

(为了简单起见,我将 MARS 设置为紧凑"内存布局,数据部分从地址 0 开始.)

<小时>

到目前为止,我们所看到的与您为其 Unix 汇编器链接的 Silicon Graphics 文档相符.(这与现代汇编程序如 GNU as(又名 GAS)和 clang 的工作方式大不相同.)

SGI 文档说:

<块引用>

提前位置计数器使表达式低序计数器零位.通常,.half.word.float.double 指令自动适当地对齐它们的数据.为了例如,.word 执行隐式 .align 2(.double 执行 .align 3).您使用 .align 0 禁用自动对齐功能.这汇编器在下一个 .text.data 处恢复自动对齐,.rdata.sdata 指令.

紧接在自动或显式对齐之前的标签也被重新调整.例如,foo: .align 3;.word 0 是一样的作为 .align 3;foo: .word0.

这并没有说明使用 .align 1under 对齐下一个 .word.只有您可以使用 .align 0 完全关闭隐式对齐作为数据指令的一部分.让 .align 1 覆盖和对齐下一个 .word 而不必禁用自动对齐是有意义的并且是有效的设计,但这不是他们的功能选择实施.

(注意 .align 0 是特殊的:按 1 字节对齐永远不必插入任何填充;当前位置始终是字节边界.因为没有理由使用 .align 0 用于对齐单个位置,语法设计者可以用不同的含义重载它:禁用自动对齐.)

MARS 确实支持这一点.(并且 then .align 1 会做你期望的,对齐到 2^1 = 2 而没有隐式 .align 2 作为 .word 的一部分,在此之后增加对齐.)

a: .byte 1.align 1乙:.align 0 # 在这一行或任何更早的行.word 0x22334455.word 0x66666666 # 这个字也是错位的;自动对齐已禁用

数据段输出:

0x44550001 0x66662233 0x00006666 作为小端字01 00 55 44 33 22 66 66 66 66 00 00 作为字节

是的,.align(显式地或作为 .word 的一部分)不只是在当前位置插入填充,它在之前插入 任何前面的标签,紧跟在最后一段数据之后.

如果你真的想避免隐式对齐到 4 字节边界,你当然可以使用 .byte.half 指令发出任何你想要的数据,而不禁用自动-结盟.您通常实际上并不想要这样,并且在大多数情况下,它可以使初学者免于遇到对齐问题.MIPS 是一个高度面向字的 ISA,因此通常没有理由使 .word 未对齐.

我看到的唯一 MARS 错误是可用性:一个非常具有误导性的工具提示.

当前表示在指定的字节边界上对齐下一个数据项:(0=byte, 1=half, 2=word, 3=double).这似乎意味着您可以对齐 .word.而且它对 .align 0 具有高度误导性,它实际上禁用了该部分其余部分的自动对齐.

<小时>

这不是 .align 在使用 GAS 语法(GNU as 或 clang)的汇编程序中的工作方式.(例如参见 GAS 手册)

在我的 Linux 桌面上,我使用 clang -c -target mipsel mips-align.s 组装了您的源代码(mipsel"是 Little-Endian MIPS,与 MARS 使用的相同.)>

然后我使用 llvm-objdump 转储 .data 部分(使用反汇编",因为这是最简单的方法,尽管我必须清除 b: 标签中不以单词边界开头的重叠部分.)

$ llvm-objdump -D mips-align-clang-output.o00000000 一个:0: 11 00 # 手动清理这一行00000002 乙:2: 55 44 33 22 附加 $19, $17, 17493

注意 b 的地址是 2,而不是 4.(这是一个未链接的 .o;当链接到可执行文件时,地址会更高.静态用于位置相关的可执行文件,或仅在运行时用于 PIE)

在 GAS 语法中,.align 只是在该位置插入填充 直到它到达对齐边界.所以你通常想把这样的指令放在标签之前,所以标签地址是对齐的并且在填充之后.也没有隐含的 .align 作为其他指令的一部分.

MARS(和老式 SGI)的行为对我来说听起来有点像训练轮",但我想这在像 MIPS 这样的面向字的​​ ISA 上是有道理的.这将解释为什么我在 SO 上看到的一些代码带有 .asciz 后跟 .word 可以在没有对齐错误的情况下加载/存储到单词!尽管如此,让汇编程序为您计算字符串常量的长度也有缺点:

<小时>

如果 MARS 的内置汇编器甚至允许您执行 msg_len = msg_end - msg(例如,从 .ascii 的末尾和开头减去标签,就像您所做的那样在 GAS 或 NASM 语法中),移动前面的标签可能会破坏字符串后面的 .word 标签.(通过在字符串上的循环的长度计算中包含填充.)

但是 MARS 的汇编器太糟糕了,无法让您在汇编时计算尺寸,因此追溯移动较早的标签通常不是问题.我不确定经典 MIPS 汇编器是否允许您在汇编时减去局部标签以获得恒定长度(例如 addiu $t0, $zero, end-start).MARS 没有,所以这个奇怪的(如果你习惯了现代汇编程序)错误"功能通常不会导致这个问题,除非你 la 开始和结束标签到寄存器中以用于带有 bne 循环条件的指针增量循环.

硬编码是愚蠢的,当汇编程序让你这样做时它很糟糕(因为没有提供好的 label - label 功能.)

似乎 MARS 只是从 SGI 的汇编程序(或此设计决策最初来自的任何地方)继承了该错误特征.

I have the following data segment

.data
a:  .byte   0x11
    .align  1
b:  .word   0x22334455

Assuming that address "a" is 0x10010000, then the expected address for the word at b is 0x10010002, but MARS stores the word at 0x10010004, ignoring the explicit ".align" directive. By the way, I used MARS MIPS simulator (Version 4.5 on a MacBook Pro) to assemble the above code.

Therefore, my question is: Is this a bug, or is it expected that the behavior of MARS differs from SGI's 1992 documentation for MIPS assembly language, e.g. Page 8-1 of this Pascal / Assembly manual?

(MARS and non-MARS MIPS asm docs agree that .align in MIPS syntax takes a power-of-2 arg, so .align 1 aligns to a 2^1 = 2-byte boundary. Unlike GAS / Unix assembler syntax for some other architectures where .align = byte align, where an arg of 1 would be redundant.)

解决方案

TL:DR: MARS tooltips are misleading; you need to disable auto-alignment for the rest of the section using .align 0. You can't just under-align the next word.


.align 1 does align by 2, that's not the problem. e.g. try it between .byte or .ascii pseudo-instructions.

e.g. this source produces 0x00110062 as the first word of the .data section, just like .byte 'b', 0, 0x11, 0 would.

.data
  a:   .ascii "b"
  b:
      .align 1
      .byte   0x11

And the b: label has address 2, after the alignment padding.

(I have MARS set to "compact" memory layout, data section starting at address 0 for simplicity.)


What we're seeing so far does match the Silicon Graphics documentation you linked for their Unix assembler. (Which is very different from how modern assemblers like GNU as (aka GAS) and clang work.)

That SGI documentation says:

Advance the location counter to make the expression low order bits of the counter zero. Normally, the .half, .word, .float, and .double directives automatically align their data appropriately. For example, .word does an implicit .align 2 (.double does an .align 3). You disable the automatic alignment feature with .align 0. The assembler reinstates automatic alignment at the next .text, .data, .rdata, or .sdata directive.

Labels immediately preceding an automatic or explicit alignment are also realigned. For example, foo: .align 3; .word 0 is the same as .align 3; foo: .word0.

This doesn't say anything about using .align 1 to under-align the next .word. Only that you can fully turn off implicit alignment as part of data directives with .align 0. Having .align 1 override and under-align the next .word without having to disable auto-alignment would have made sense and been a valid design, but that's not a feature they chose to implement.

(Note that .align 0 is special: aligning by 1 byte never has to insert any padding; the current position is always a byte boundary. Since there's no reason to ever use .align 0 for aligning a single position, the designers of the syntax could overload it with a different meaning: disable auto-alignment.)

MARS does support that. (And then .align 1 would do what you expect, aligning to 2^1 = 2 without an implicit .align 2 as part of .word increasing the alignment after that.)

a:   .byte 1
 .align 1
b:
 .align 0              # on this line or any earlier line
 .word   0x22334455

 .word   0x66666666    # this word is also misaligned; auto-align is disabled

data section output:

0x44550001    0x66662233    0x00006666     as little-endian words
01 00 55 44   33 22 66 66   66 66 00 00    as bytes

And yes, .align (explicitly or as part of .word) doesn't just insert padding at the current position, it inserts it before any preceding labels, right after the last piece of data.

You can of course emit whatever data you want using .byte or .half directives if you really want to avoid implicit alignment to 4-byte boundaries, without disabling auto-alignment. You normally don't actually want that, and it will save beginners from having alignment problems in most cases. MIPS is a heavily word-oriented ISA so there's usually little reason to have an under-aligned .word.

The only MARS bug I see is usability: a very misleading tooltip.

It currently says align the next data item on specified byte boundary: (0=byte, 1=half, 2=word, 3=double). This seems to imply that you could under-align a .word. And it's highly misleading about .align 0 which actually disables auto-alignment for the rest of the section.


This is not how .align works in assemblers that use GAS syntax (GNU as or clang). (e.g. see the GAS manual)

On my Linux desktop, I assembled your source code using clang -c -target mipsel mips-align.s ("mipsel" is Little-Endian MIPS, same as MARS uses.)

Then I used llvm-objdump to dump the .data section (with "disassembly" because that's the easiest way, although I had to clean up overlap from the b: label that doesn't start at a word boundary.)

$ llvm-objdump -D mips-align-clang-output.o         
00000000 a:
       0: 11 00                # manually cleaned up this line
00000002 b:
       2: 55 44 33 22                   addi    $19, $17, 17493

Note that b has address 2, not 4. (This is an un-linked .o; when linked into an executable the address would be higher. Statically for a position-dependent executable, or just at run-time for a PIE)

In GAS syntax, .align simply inserts padding at that position until it reaches an alignment boundary. So you normally want to put such directives before labels, so the label address is aligned and comes after the padding. There's also no implicit .align as part of other directives.

MARS's (and old-school SGI) behaviour sounds kind of "training wheels" to me, but I guess it makes some sense on a heavily word-oriented ISA like MIPS. That would explain why some code I've seen on SO with .asciz followed by .word works without alignment faults for loads/stores to the word! Still, it has downsides for letting the assembler calculate the length of a string constant for you:


If MARS's built-in assembler even let you do msg_len = msg_end - msg (subtracting labels from the end and start of a .ascii for example, like you would in GAS or NASM syntax), moving preceding labels could break that for a .word after a string. (By including the padding in a length calculation for a loop over the string.)

But MARS's assembler sucks too much to let you calculate sizes at assemble time, so retroactively moving earlier labels is not usually a problem. I'm not sure if classic MIPS assemblers let you subtract local labels at assemble time to get a constant length (e.g. addiu $t0, $zero, end-start) or not. MARS doesn't, so this bizarre (if you're used to modern assemblers) "mis"feature doesn't usually cause that problem, unless you la start and end labels into registers for use in a pointer increment loop with a bne loop condition.

Hard-coding is dumb, and it sucks when an assembler makes you do it (by not providing good label - label features.)

It seems that MARS just inherited that misfeature from SGI's assembler (or wherever this design decision originally came from).

这篇关于MARS MIPS 模拟器的内置汇编器比要求的多吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆