什么是PC相对寻址,如何在MASM中使用它? [英] What is PC-relative addressing and how can I use it in MASM?
问题描述
我正在关注Jack Crenshaw的编译器教程(如果您看一下我的个人资料,这就是我所有关于大声笑的问题),并且到了引入变量的地步.他评论说68k要求所有内容都是位置无关的",这意味着它是相对于PC的".我知道PC是程序计数器,而在x86上是EIP.但是他使用的语法类似于MOVE X(PC),D0
,其中X是变量名.我已经读了一些书,以后再说在.data中声明一个变量也没有说什么.这是如何运作的?为了使此功能在x86中起作用,我将在MOV EAX, X(PC)
中将X(PC)替换为什么?
说实话,我什至还不确定这是否应该输出有效的代码,但是到目前为止,我已经将其添加到我的编译器中,并添加了适当的标头等内容和要组装的批处理文件,链接并运行结果.
下面简要概述了静态分配的全局变量(此问题的实质)是什么,以及该怎么做.关于他们.
什么是变量
对于机器,没有变量之类的东西.它从不听说过它们,从不关心它们,只是没有它们的概念.它们只是为在RAM中的特定位置(在虚拟内存的情况下,是地址空间中的位置)分配一致含义的约定.
您实际放置变量的位置完全取决于您-但在合理范围内.如果要写(可能是这样),最好将其放在可写的位置,这意味着:该变量的地址应位于已分配且可写的内存区域内. .data节只是为此的另一种约定.您不必这样称呼,甚至不需要一个单独的部分(您可以将.text部分写为可写的,并在其中分配全局变量,如果您确实愿意的话),甚至可以 使用VirtualAllocEx
(或等效功能)之类的OS函数在固定位置分配内存并使用(但不要这样做).由你决定.但是.data节是放置它们的方便位置.
分配"变量只是选择一个地址,以便该变量不与任何其他变量重叠.这并不难,只需按顺序对其进行布局:在要放置它们的任何位置的开头处启动一个指针var_ptr
(因此,.data节的VA,如果使用链接器,则为0),并且然后对于每个变量v
:
-
v
的位置l
是align(var_ptr, round_up_to_power_of_2(sizeof(v)))
- 将
var_ptr
设置为l + sizeof(v)
作为一个较小的变体,您可以跳过对齐方式(大多数编译器教科书都这样做,但是在现实生活中您应该对齐). x86通常会让您摆脱困境.
作为较大的变化,您可以尝试填充"路线留下的孔.填充至少 most 个孔的最简单方法是将变量排在第一位(如果所有大小均为2的幂,则填充所有孔).尽管这样可以节省一些空间(尽管不一定要有任何空间,因为节本身是对齐的),但它永远不会节省太多.在通常的对齐规则下,按顺序排列它们"算法在最坏的情况下将几乎浪费其在孔上使用的一半空间.导致这种情况的模式是最小类型和最大类型的交替序列.老实说,那不会真的发生-即使发生了,这也不是全部那不好.
然后,您必须确保.data段足够大以容纳所有变量,并且初始内容与变量初始化时的内容匹配.
但是您甚至不必执行任何操作.您可以在汇编代码中使用变量声明(您知道如何执行此操作),然后汇编器/链接器(它们通常都在其中起作用)将为您完成所有这些操作(当然,它还将用变量地址替换变量名).
如何使用变量
这取决于.如果您使用的是汇编器/链接器,则只需参考为变量指定的标签即可.标签当然不必与源代码中的名称匹配,它可以是任何合法的唯一名称(例如,您可以使用声明的AST节点ID,并在其前面加一个下划线).>
因此,加载变量可能如下所示:
mov eax, dword ptr [variablelabel]
或者,在x64上,也许是这个
mov eax, dword ptr [rel variablelabel]
哪个会发出相对撕裂的地址.如果这样做,则不必关心RIP的当前值或变量的分配位置,汇编器/链接器将进行处理.在x64上,通常会使用类似RIP的相对地址,原因如下:
- 它允许.data段位于不是前4GB(或2GB)地址空间的位置,只要它靠近.text段即可.
- 比具有绝对64位地址的指令短
- 只有两条指令甚至占用一个绝对的64位地址,即
mov rax,[imm64]
和mov [imm64],rax
- 您可以免费获得搬迁
如果您不使用汇编器和/或链接器,则(至少在某种程度上)将成为您自己的工作,以您为变量名分配的任何地址替换变量名(如果您使用的是链接器,但没有汇编程序,则需要重定位数据,但您自己不会决定变量的绝对地址).
使用绝对地址时,可以与发出指令并行放入"(前提是您已经分配了变量).当您使用相对于RIP的地址时,只有在您确定代码将在哪里时才可以将它们放入其中(因此,您将在偏移量为0的位置发出代码,进行簿记,确定代码在何处,然后您返回并用实数偏移量代替0),这本身不是一个简单的问题,除非您使用幼稚的方式并且不关心分支大小的优化(在这种情况下,您知道指令的地址为您发出它的时间,以及变量相对于RIP的偏移量是多少). RIP相对偏移很容易计算,只需从变量的VA(虚拟地址)中减去当前指令后 位置的RIP.
但这还不是全部
您可能希望使某些变量不可写,以至于任何以编译器无法检测到的有趣方式"向它们写入的尝试都将失败.可以通过将它们放在通常称为.rdata的只读节中来实现(但实际上名称无关紧要,重要的是,是否在PE标头中设置了该节的可写"标志).尽管有时将它用于字符串或数组常量(不是正确的变量),但这并不经常执行.
定期执行的操作是将零初始化变量放在自己的部分中,该部分在可执行文件中不占空间,而只是简单地归零.放置零初始化变量可以在可执行文件中节省一些空间.该部分通常称为.bss(bullsh * t部分的简称 not ),但与往常一样,该名称无关紧要.
更多
大多数编译器教科书都以不同的方式处理这个问题,尽管通常没有那么详细,因为当您深入到它时:静态变量并不难.当然没有比较汇编的大多数其他方面.另外,某些方面是特定于平台的,例如各节的详细信息以及事物最终如何在可执行文件中结束.
一些来源/有用的东西(在使用编译器时,我发现所有这些都是有用的):
- PE101
- PE深入
- PE资源管理器
- CFF资源管理器
- MOVE X(PC),D0 where X is a variable name. I've read a little ahead and it says nothing later about declaring a variable in .data. How does this work? To make this work in x86, what would I replace X(PC) with in
MOV EAX, X(PC)
?To be honest I'm not even sure this is supposed to output working code yet, but up to this point it has and I've added code to my compiler that adds the appropriate headers etc and a batch file to assemble, link and run the result.
解决方案Here's a short overview over what a statically allocated global variable (which is what this question is about) really is and what to do about them.
What is a variable anyway
To the machine, there is no such thing as a variable. It never hears about them, it never cares about them, it just has no concept of them. They're just a convention to assign a consistent meaning to a particular location in RAM (in the case of virtual memory, a position in your address space).
Where you actually put a variable, is sort of up to you - but within reason. If you're going to write to it (and you probably are), it had better be in a writable location, which means: the address of that variable should fall within a memory area that is allocated and writable. The .data section is just an other convention for that. You don't have to call it that, you don't even need a separate section (you could make your .text section writable and allocate your globals there, if you really wanted), you could even use OS functions like
VirtualAllocEx
(or equivalent) to allocate memory at a fixed position and use that (but don't do that). It's up to you. But the .data section is a convenient place to put them."Allocating" the variables is just a matter of choosing an address such that the variable doesn't overlap with any other variable. That's not hard, just lay them out sequentially: start a pointer
var_ptr
at the beginning of wherever you're going to put them (so the VA of your .data section, or 0 if you're using a linker), and then for every variablev
:- the location
l
ofv
isalign(var_ptr, round_up_to_power_of_2(sizeof(v)))
- set
var_ptr
tol + sizeof(v)
As a minor variation, you could skip the alignment (most compiler textbooks do that, but in real life you should align). x86 usually lets you get away with that.
As a bigger variation, you could try to "fill the holes" left by the alignments. The simplest way to fill at least most holes is to just sort the variables biggest-first (that fills all holes if all sizes are powers of two). While that may save some space (though not necessarily any, because sections are aligned themselves), it never saves much. Under the usual alignment rules the "just lay them out sequentially"-algorithm will, at worst, waste nearly half the space it uses on holes. The pattern that leads to that is an alternating sequence of the smallest type and the biggest type. And let's be honest, that wouldn't really happen - and even if it did, that's not all that bad.
Then, you have to make sure that the .data segment is big enough to hold all variables, and that the initial contents match what the variables were initialized with.
But you don't even have to do any of this. You can use variable declarations in the assembly code (you know how to do this), and then the assembler/linker (they typically both play a roll in this) will do all of this for you (and, of course, it will also do the replacement of variable names by variable addresses).
How to use a variable
It depends. If you're using an assembler/linker, just refer to the label that you gave the variable. The label, of course, does not have to match the name in the source code, it can be any legal unique name (for example, you could use the AST node ID of the declaration with an underscore in front of it).
So loading a variable could look like this:
mov eax, dword ptr [variablelabel]
Or, on x64, perhaps this
mov eax, dword ptr [rel variablelabel]
Which would emit a rip-relative address. If you do that, you don't have to care about the current value of RIP or where the variable is allocated, the assembler/linker will take care of it. On x64, using a RIP-relative address like that is common, for several reasons:
- it allows the .data segment to be somewhere that isn't the first 4GB (or 2GB) of address space, as long as it's close to the .text segment
- it's shorter than an instruction with an absolute 64bit address
- there are only two instructions that even take an absolute 64bit address, namely
mov rax,[imm64]
andmov [imm64],rax
- you get relocations for free
If you're not using an assembler and/or linker, it becomes (at least to some extend) your own job to replace variable-names by whatever address you allocated for them (if you're using a linker but no assembler, you'd make relocation data but you wouldn't yourself decide on the absolute addresses of variables).
When you're using absolute addresses, you can "put them in" in parallel with emitting instructions (provided you've already allocated the variables). When you're using RIP-relative addresses, you can only put them in once you decide where the code will be (so you'd emit code where the offsets are 0, do some bookkeeping, decide where the code will be, then you go back and replace the 0's by the real offsets), which is a non-trivial problem in itself unless you use a naive way and don't care about branch-size-optimization (in that case you know the address of an instruction at the time you emit it, and therefore what the offset of a variable relative to RIP would be). A RIP-relative offset is easy enough to calculate, just subtract the RIP of the position immediately after the current instruction from the VA (virtual address) of the variable.
But that's not all
You may want to make some variables non-writable, to the point that any attempt to write to them in "funny ways that the compile can't detect" will fail. That can be accomplished by putting them in a read-only section, typically called .rdata (but the name is irrelevant really, what matters is whether the "writable" flag of the section is set in the PE header). This isn't done often, though it is sometimes used for string or array constants (which aren't properly variables).
What is done regularly, is putting zero-initialized variables in their own section, a section that takes no space in the executable file but is instead simply zeroed out. Putting zero-initialized variables may save some space in the executable. This section is commonly called .bss (not short for bullsh*t section), but as always, the name is irrelevant.
More
Most compiler textbooks deal with this subject to varying amounts, though usually not in much detail, because when you get right down to it: static variables aren't hard. Certainly not compared most other aspects of compilations. Also, some aspects are very platform specific, such as the details around the sections and how things actually end up in an executable.
Some sources/useful things (I've found all of these useful while working on compilers):
这篇关于什么是PC相对寻址,如何在MASM中使用它?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
- the location