火星MIPS模拟器是大端还是小端 [英] Is mars MIPS simulator Big or Little Endian

查看:475
本文介绍了火星MIPS模拟器是大端还是小端的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我必须确定火星模拟器是大作业还是小作业,这乍看起来似乎很简单,但是我遇到了一些问题.

首先,我尝试在内存中以.byte 0、0、0、1的形式存储4个字节,这在内存中显示为0x01000000,因此,按相反的顺序,这似乎表明模拟器是小端序的,但是当我将这4个字节作为整数加载到寄存器中,再次出现在寄存器中的是0x01000000,据我了解,如果是小字节序,将加载的内容是0x00000001.

另外,当使用.word 1存储4个字节时,存储的是0x00000001,这次没有字节反转.

我想知道模拟器是大字节序还是小字节序,并对此行为做一个解释

解决方案

您的问题涉及多个层次,所以我尝试一一解决它们……

机器:

机器具有可按字节寻址的内存.第一个字节的地址为0,第二个字节的地址为1,以此类推...每当我在此答案中写存储器的内容时​​,我都会使用以下格式:01 02 0E 0F 10 ...,使用十六进制值,并使用字节之间的空格,地址不断从起始地址到结束地址. IE.如果此内容将从地址0x800000开始,则内存将为(全六):

address | byte value
------- | ----------
800000  | 01
800001  | 02
800002  | 0E
800003  | 0F
800004  | 10
800005  | ...

到目前为止,目标MIPS平台是小端字节序还是大字节序字节都没有关系,只要涉及字节大小的内存,字节顺序就是正常".

如果要将地址0x800000中的 byte 加载到t0中(使用lb指令),则t0等于值1.

如果您将地址0x800000中的 word 加载到t0中(使用lw指令),则字节序将最终发挥作用.

little-endian 机器上,t0等于值0x0F0E0201,单词的第一个字节(在内存中)为256 0 (最低的功率),第二个是256 1 的量,...最后一个是256 3 的量.

big-endian 机器上,t0等于值0x01020E0F,单词的第一个字节(在内存中)为256 3 ,第二个是256 2 ,... ...最后一个是256 0 .

( 256是2 8 ,还有那个神奇的数字来自一个字节是8位",一个位可以包含两个值(0或1),一个字节可以包含8位,因此一个字节可以包含2 8 个不同的值)

在这两种情况下,CPU都会从内存中读取相同的四个字节(地址从0x800000到0x800003),但是字节顺序定义了它们将以哪个顺序显示为字值的最后32位.

t0由CPU芯片上的32位物理组成,没有地址.当您想在CPU指令中对其进行寻址(即使用存储在t0中的值)时,可以将其编码为指令作为$8寄存器(为了方便您的汇编程序,$8具有$t0别名),所以我在使用t0别名).

字节序不适用于寄存器的这32位,它们已经是32位b0-b31,并且一旦加载值0x0F0E0201,则将这32位设置为0000 1111 0000 1110 ...(我从从顶部b31位到底部b0位,以了解向左/向右移位的指令,并使其以人类格式的二进制数工作),没有必要考虑寄存器的字节序或将这些位存储在寄存器中的物理顺序芯片,就可以认为它是完整的32位值,并在算术指令中可以那样工作.

将带有lb的字节值加载到寄存器中时,它将降落到包含b7副本的b8-b31的b0-b7位中(将带符号的8位值扩展为带符号的32位值).

将寄存器的值存储到内存中时,字节序又一次适用,即将word0x11223344存储到内存中会将单个字节设置为44 33 22 11.

汇编器(源代码和编译)

针对其目标平台的配置良好的汇编程序将向程序员隐藏字节序,以便使用单词值方便.

因此,当您定义内存值时,例如:

myPreciousValue .word 0x11223344

汇编器将解析文本(您的源代码是文本(!),即一个字符是一个字节的值-以ASCII编码,如果您在UTF8文本编辑器中编写源代码并使用非ASCII字符,则它们可能会被编码在多个字节之间,ASCII可打印字符在ASCII和UTF8中都具有相同的编码,并且仅占据单个字节)"0x11223344"(10个字节30 78 31 31 32 32 33 33 34 34),从中计算出32位字值0x11223344,然后将对此应用目标平台字节序,以产生四个字节的机器代码,或者:

44 33 22 11           # little-endian target

或:

11 22 33 44           # big-endian target

然后在代码中使用lw指令时,将myPreciousValue从内存中加载到寄存器中,该寄存器将包含预期的字值0x11223344(只要您没有混淆汇编程序配置)并且使用了错误的字节序,这在MARS/SPIM中是不会发生的,因为它在所有内容(VM,汇编器,调试器)中仅支持小字节序配置.

因此,程序员不必在每次将32位值写入源代码中的某个位置时就考虑字节顺序,汇编程序将对其进行解析并将其处理为字节值的目标变体.

如果程序员想在内存中定义四个字节01 02 03 04,她可以是聪明"的,并为小端目标平台使用.word 0x04030201,但这混淆了最初的意图,所以我建议使用.byte 1, 2, 3, 4在这种情况下,程序员的意图是定义字节而不是字.

当您使用.byte指令声明字节值时,将按照编写它们的顺序对其进行编译,而不会对其施加字节序.

调试器

最后是调试器的内存/寄存器视图...此工具将再次尝试以直观/便捷的方式工作,因此,当您检查内存视图并将其配置为字节时,内存将显示为:

0x800000: 31 32 33 34 41 42 43 44 | 1234ABCD

当您将其切换到"word"视图时,它将使用配置的字节序以目标平台顺序并置字节,即在MARS/SPIM中为小字节序平台,它将显示在同一内存中:

0x800000: 34333231 44434241

(如果还包括ASCII视图,是否也用词写出"?如果是,那么它将显示为4321 DCBA.目前我还没有安装MARS/SPIM来检查它们调试器中的内存视图实际上看起来很抱歉)

因此,作为程序员,您可以直接从显示中读取字"值,而无需将字节改组为正确"顺序,因此,您已经知道了字"值是什么(从内存内容的这四个字节开始). /p>

默认情况下,寄存器视图通常显示十六进制字值,即在将来自该地址0x800000的字加载到t0之后,寄存器$8将包含值0x34333231(十进制875770417).

如果您感兴趣的是用于该加载的内存中第一个字节的值是什么,那么此时,您必须应用该目标平台的字节序知识,并查看前两位数字"34"(大)字节序),或在寄存器视图中的最后两个"31"(小字节序)(或者更确切地说,在字节视图模式下使用内存视图以避免任何错误).

代码中的运行时检测.

因此,使用上述所有信息,运行时检测代码应该易于理解(很遗憾,我目前没有MARS/SPIM,所以我没有验证它是否起作用,请告诉我):

.data

checkEndianness: .word 0    # temporary memory for function
    # can be avoided by using stack memory instead (in function)

.text

main:
    jal  IsLittleEndian
    # ... do something with v0 value ...
    ... exit (TODO by reader)

# --- functions ---

# returns (in v0) 0 for big-endian machine, and 1 for little-endian
IsLittleEndian:
    # set value of register to 1
    li $v0,1
    # store the word value 1 into memory (4 bytes written)
    sw $v0,(checkEndianness)
      # memory contains "01 00 00 00" on little-endian machine
      #              or "00 00 00 01" on big-endian machine
    # load only the first byte back
    lb $v0,(checkEndianness)
    jr $ra

这有什么用?只要您为单个目标平台编写软件,并且由目标CPU存储/加载单词,就无需关心字节顺序.

但是,如果您使用的是多平台软件,并且确实保存了二进制文件...为了使文件在大/小端字节序平台上以相同的方式工作,文件结构的规范还必须指定文件数据.然后根据该规范,一种类型的目标平台可能会将其读取为本机"字值,另一类将不得不将字值中的字节改组以读取正确的字值(此外,规范还应指定多少个字节单词"是:)).这样的运行时测试可能很方便,如果您要将混洗器包括在保存/加载例程中,则可以使用字节序检测例程来确定是否必须对字字节进行混洗.这将使目标平台的字节序对其余代码透明",而其余代码仅会将其本机"word"值发送到保存/加载例程,并且您的保存/加载可能在每个平台上使用相同的源(至少您使用诸如C之类的多平台可移植编程语​​言,当然,用于MIPS的程序集根本无法在不同的CPU上运行,并且需要从头开始重写.)

此外,网络通信通常是使用自定义二进制协议完成的(通常包装在网络层最常见的TCP/IP数据包中,甚至加密,但是您的应用程序将在其中一点提取原始字节内容) ,然后发送/接收的数据的字节顺序很重要,其他"平台必须重新组合字节以读取正确的值.

其他平台(非MIPS)

可以从上面应用几乎所有内容,只需检查另一个平台上的byteword是什么(我认为byte在过去35年以上的时间里已定为8位,但是word可能会有所不同,例如在x86平台上word仅是16位).小字节序的机器仍将以相反"的顺序读取字"字节,第一个字节用作最小256 0 幂的数量,最后一个字节用作最大256次幂(256 1 在x86平台上,因为那里只有两个字节构成单词,所以MIPS单词"在x86世界中称为双字"或"dword".

I have to determine if the mars simulator is big or little endian as homework, this seems pretty straightforward at first, but I am having some issues.

First I tried storing 4 bytes in memory with .byte 0, 0, 0, 1, in memory this appears as 0x01000000, so, in reverse order, which seems to indicate that the simulator is little endian, however, when I load the 4 bytes as an integer to a register, what appears in the register is 0x01000000 again, as I understand if it was little endian what would be loaded is 0x00000001.

Also, when storing 4 bytes with .word 1, what is stored is 0x00000001, no bytes reversed this time.

I would like to know whether the simulator is big or little endian, and an explanation to this behaviour

解决方案

There are several layers in your question involved, so I try to address them one by one...

Machine:

The machine has memory addressable by bytes. First byte has address 0, second has address 1, etc... Whenever I will write about content of memory in this answer, I will use this formatting: 01 02 0E 0F 10 ..., using hexadecimal values and using spaces between bytes, with addresses going continually from starting address toward ending address. I.e. if this content would start at address 0x800000, the memory would be (all hexa):

address | byte value
------- | ----------
800000  | 01
800001  | 02
800002  | 0E
800003  | 0F
800004  | 10
800005  | ...

So far it does not matter, whether the target MIPS platform is little or big endian, as long as byte-sized memory is involved, the order of bytes is "normal".

If you would load byte from address 0x800000 into t0 (with lb instruction), t0 will be equal to value 1.

If you would load word from address 0x800000 into t0 (with lw instruction), the endianness will come to play finally.

On little-endian machine the t0 will be equal to value 0x0F0E0201, the first byte of word (in memory) is amount of 2560 (the lowest power), second is amount of 2561, ... the last one is amount of 2563.

On big-endian machine the t0 will be equal to value 0x01020E0F, the first byte of word (in memory) is amount of 2563, second is amount of 2562, ... the last one is amount of 2560.

(256 is 28, and that magic number comes from "one byte is 8 bits", one bit can contain two values (0 or 1), and one byte has 8 bits, so one byte can contain 28 different values)

In both cases the CPU will read the same four bytes from memory (at addresses 0x800000 to 0x800003), but the endianness defines in which order they will appear as the final 32 bits of word value.

The t0 is physically formed by 32 bits on the CPU chip, it has no address. When you want to address it in CPU instruction (i.e. use value stored in t0), you encode it into instruction as $8 register ($8 has $t0 alias for convenience in your assembler, so I'm using that t0 alias rather).

The endianness does not apply to those 32 bits of register, they are already 32 bits b0-b31, and once the value 0x0F0E0201 is loaded, those 32 bits are set to 0000 1111 0000 1110 ... (I'm writing it from top b31 bit down to bottom b0, to make sense of shift left/right instructions and also to make it work as human formatted binary number), there's no point to think about endianness of register or in which physical order the bits are stored on the chip, it's enough to think about it as full 32 bit value and in arithmetic instructions it will work as that.

When loading byte value with lb into register, it lands into b0-b7 bits with b8-b31 containing copy of b7 (sign-extending the signed 8 bit value into signed 32 bit value).

When storing value of register into memory, the endianness again does apply, i.e. storing word value 0x11223344 into memory will set up individual bytes as 44 33 22 11.

Assembler (source code and compilation)

A well configured assembler for it's target platform will hide the endianness from programmer, to make usage of word values convenient.

So when you define memory value like:

myPreciousValue .word 0x11223344

The assembler will parse text (your source code is text (!), i.e. one character is one byte value - in ASCII encoding, if you write the source in UTF8 text editor and use non-ASCII characters, they may be encoded across multiple bytes, the ASCII printable characters have the same encoding in both ASCII and UTF8, and occupy single byte only) "0x11223344" (10 bytes 30 78 31 31 32 32 33 33 34 34), calculate 32 bit word value 0x11223344 out of it, and then it will apply target-platform endianness to that to produce four bytes of machine code, either:

44 33 22 11           # little-endian target

or:

11 22 33 44           # big-endian target

When you then use the lw instruction in your code, to load myPreciousValue from memory into register, the register will contain the expected word value 0x11223344 (as long as you didn't mix up your assembler configuration and used the wrong endianness, can't happen in MARS/SPIM, as that supports only little-endian configuration in everything (VM, assembler, debugger)).

So the programmer does not have to think about endianness every time he writes the 32 bit value somewhere in the source, the assembler will parse and process it to the target variant of byte values.

If the programmer wants to define four bytes 01 02 03 04 in memory, she can be "smart" and use .word 0x04030201 for little-endian target platform, but that's obfuscating the original intent, so I suggest to use .byte 1, 2, 3, 4 in such case, as the intent of programmer was to define bytes, not word.

When you declare byte values with .byte directive, they are compiled in the order how you write them, no endianness is applied to that.

Debugger

And finally memory/register view of debugger... this tool again will try hard to work in intuitive/convenient way, so when you check memory view, and have it configured to bytes, the memory will be shown as:

0x800000: 31 32 33 34 41 42 43 44 | 1234ABCD

When you switch it to "word" view, it will use the configured endianness to concatenate bytes in the target platform order, i.e. in MARS/SPIM as little-endian platform it will show on the same memory:

0x800000: 34333231 44434241

(if the ASCII view is also included, is it "worded" too? If yes, then it will look as 4321 DCBA. I don't have at the moment MARS/SPIM installed to check what they memory view in debugger actually looks like, sorry)

So you as programmer can read the "word" value directly from display, without shuffling the bytes into "correct" order, you already see what the "word" value will be (from those four bytes of memory content).

The register view usually by default shows hexadecimal word values, i.e. after loading word from that address 0x800000 into t0, the register $8 will contain value 0x34333231 (875770417 in decimal).

If you are interested what was the value of first byte in memory used for that load, at this point you have to apply your knowledge of endianness of that target platform, and look either at the first two digits "34" (big endian), or last two "31" (little endian) in the register view (or rather use the memory view in byte-view mode to avoid any mistake).

Runtime detection in code.

So with all that information above, the runtime detection code should be easy to understand (unfortunately I don't have MARS/SPIM at the moment, so I didn't verify it works, let me know):

.data

checkEndianness: .word 0    # temporary memory for function
    # can be avoided by using stack memory instead (in function)

.text

main:
    jal  IsLittleEndian
    # ... do something with v0 value ...
    ... exit (TODO by reader)

# --- functions ---

# returns (in v0) 0 for big-endian machine, and 1 for little-endian
IsLittleEndian:
    # set value of register to 1
    li $v0,1
    # store the word value 1 into memory (4 bytes written)
    sw $v0,(checkEndianness)
      # memory contains "01 00 00 00" on little-endian machine
      #              or "00 00 00 01" on big-endian machine
    # load only the first byte back
    lb $v0,(checkEndianness)
    jr $ra

What is it good for? As long as you write your software for the single target platform, and you are storing/loading words by the target CPU, you don't need to care about endianness.

But if you have software which is multi-platform, and it does save binary files... To make the files work in the same way on both big/little endian platforms, the specification of file structure must specify also endianness of the file data. And then according to that specs, one type of target platforms may read it as "native" word values, the other one will have to shuffle the bytes in word values to read correct word value (plus the specs should also specify how many bytes "word" is :) ). Then such runtime test may be handy, if you will include the shuffler into save/load routines, using the endianness detection routine to decide whether it has to shuffle the word bytes or not. That will make the target platform endianness "transparent" to the remaining code, which will simply send to save/load routine it's native "word" values, and your save/load may use the same source on every platform (at least as long as you use some multi-platform portable programming language like C, of course the assembly for MIPS will not work on different CPUs at all, and would need to be rewritten from scratch).

Also the network communication is often done with custom binary protocols (wrapped usually in the most common TCP/IP packets for the network layer, or even encrypted, but your application will extract the raw bytes content out of it at one point), and then endianness of sent/received data matters, and the "other" platforms have to shuffle the bytes to read correct values then.

Other platforms (non-MIPS)

Can apply pretty much everything from above, just check what is byte and word on the other platform (I think byte is pretty set in stone as 8 bits for last 35+ years, but word may differ, for example on x86 platforms word is 16 bit only). Still little-endian machine will read "word" bytes in "reversed" order, the first byte used as amount of the smallest 2560 power and last byte used as amount of the highest 256 power (2561 on x86 platform, as only two bytes form word there, the MIPS "word" is called "double word" or "dword" in x86 world).

这篇关于火星MIPS模拟器是大端还是小端的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆