gfortran for dummies:mcmodel = medium做了什么? [英] gfortran for dummies: What does mcmodel=medium do exactly?
问题描述
我有一些代码在编译时给我提供了重定位错误,下面是一个例子来说明问题:
程序main
common / baz / a,b,c
真实a,b,c
b = 0.0
调用foo()
print *,b
结束
子程序foo()
common / baz / a,b,c
实数a,b,c
整数,参数:: nx = 450
整数,参数:: ny = 144
整数,参数:: nz = 144
整数,参数:: nf = 23 * 3
real :: bar(nf ,nx * ny * nz)
!real,allocatable,dimension(:, :) :: bar
!allocate(bar(nf,nx * ny * nz))
bar = 1.0
b = bar(12,32 * 138 * 42)
返回
结束
使用 gfortran -O3 -g -o test test.f
编译它,我得到以下错误:
重定位被截断为适合:在COMMON部分定义的符号`baz_'中的R_X86_64_PC32 / tmp / ccIkj6tt.o
但是,如果我使用 gfortran -O3 -mcmodel = medium -g -o test test.f
,它会起作用。
我的问题是确切的 -mcmodel = medium
做?我的印象是两个版本的代码(包含 allocatable
数组的代码和没有的代码)大致相当......
由于 bar
非常大,所以编译器在堆栈上生成静态分配而不是自动分配。静态数组由 .comm
程序集指令创建,该指令在所谓的COMMON部分中创建分配。收集来自该部分的符号,合并相同名称的符号(缩小为一个符号请求,其大小等于所请求的最大大小),然后什么是休息被映射到大多数可执行格式的BSS(未初始化数据)部分。对于ELF可执行文件, .bss
部分位于数据段中,就在堆的数据段部分之前(还有另一个由匿名内存映射管理的堆部分在$ 小
内存模型中使用32位寻址指令来寻址符号在x86_64上。这使得代码更小,速度更快。使用 small
内存模型时的一些程序集输出:
movl $ bar。 1535,%ebx< ----指令长度节省
...
movl%eax,baz_ + 4(%rip)< ----问题!!
...
.local bar.1535
.comm bar.1535,2575411200,32
...
.comm baz_,12,16
这使用32位移动指令(5个字节长)将 bar.1535
符号(该值等于符号位置的地址)写入 RBX
寄存器的低32位(高32位被清零)。 bar.1535
符号本身是使用 .comm
指令分配的。之后分配 baz
COMMON块的内存。因为 bar.1535
非常大,所以 baz _
从<$ c $开始时结束超过2 GiB c> .bss 部分。这会在第二个 movl
指令中出现问题,因为应该使用来自 RIP
的非32位(带符号)偏移量解决必须将 EAX
的值移入的 b
变量。这仅在链接时间内检测到。汇编器本身并不知道适当的偏移量,因为它不知道指令指针( RIP
)的值是什么(它取决于绝对虚拟地址,代码被加载并且由链接器确定),所以它只是简单地放置一个 0
的偏移量,然后创建一个类型为 R_X86_64_PC32
。它指示链接器将
0
的值修补为实际偏移值。但它不能做到这一点,因为偏移值不适合在一个有符号的32位整数内,因此会保留。
使用 code>内存模型就像这样:
movabsq $ bar.1535,%r10
...
movl%eax,baz_ + 4(%rip)
...
.local bar.1535
.largecomm bar.1535,2575411200,32
...
.comm baz_,12,16
首先是一个64位立即数移动指令(10字节长)用于将代表 bar.1535
地址的64位值放入寄存器 R10
。使用 .largecomm
指令分配 bar.1535
符号的内存,因此它以 .lbss
ELF exectuable部分。 .lbss
用于存储可能不适合前2 GiB的符号(因此不应使用32位指令或RIP相对寻址进行寻址),而较小事情转到 .bss
( baz _
仍然使用 .comm $ c $分配c>而不是
.largecomm
)。由于 .lbss
部分位于ELF链接描述文件中的 .bss
部分之后,所以 baz _
最终不会因使用32位RIP相关寻址而无法访问。
所有寻址模式在 System V ABI:AMD64架构处理器补充。这是一个沉重的技术阅读,但必须读懂任何真正想了解64位代码如何在大多数x86_64 Unix上运行的人。
当使用> ALLOCATABLE
数组, gfortran
分配堆内存(最有可能实现为匿名内存映射,因为分配的大小很大):
movl $ 2575411200,%edi
...
调用malloc
movq%rax,% rdi
基本上 RDI = malloc(2575411200)
。从那时起,通过使用存储在 RDI
中的正偏移量访问 bar
元素:
movl 51190040(%rdi),%eax
movl%eax,baz_ + 4(%rip)
对于从 bar
开头超过2 GiB的位置,a使用更复杂的方法。例如。以实现 b = bar(12,144 * 144 * 450)
gfortran
发出:
;一些计算会在RAX中保留偏移
movl(%rdi,%rax),%eax
movl%eax,baz_ + 4(%rip)
这个代码不受内存模型的影响,因为没有任何关于动态分配的地址。另外,由于该数组未被传递,因此不会构建任何描述符。如果添加另一个采用假设形状数组的函数并将 bar
传递给它,则创建 bar
的描述符作为自动变量(即在 foo
的堆栈上)。如果使用 SAVE
属性使数组变为静态,则描述符将放置在 .bss
部分中:
movl $ bar.1580,%edi
pre>
...
; RAX仍然保存由malloc
返回的分配内存的地址;计算,计算
movl -232(%rax,%rdx,4),%eax
movl%eax,baz_ + 4(%rip)
第一步准备函数调用的参数(在我的示例中
call boo(bar)
whereboo
有一个接口,声明它为假设形状数组)。它将bar
数组描述符的地址移动到EDI
中。这是一个32位立即移动,因此描述符预计会在前2个GiB中。实际上,它在小
和中$ c>中分配在
.bss
$ c $>这样的内存模型:
.local bar.1580
.comm bar.1580,72,32
I have some code that is giving me relocation errors when compiling, below is an example which illustrates the problem:
program main common/baz/a,b,c real a,b,c b = 0.0 call foo() print*, b end subroutine foo() common/baz/a,b,c real a,b,c integer, parameter :: nx = 450 integer, parameter :: ny = 144 integer, parameter :: nz = 144 integer, parameter :: nf = 23*3 real :: bar(nf,nx*ny*nz) !real, allocatable,dimension(:,:) :: bar !allocate(bar(nf,nx*ny*nz)) bar = 1.0 b = bar(12,32*138*42) return end
Compiling this with
gfortran -O3 -g -o test test.f
, I get the following error:relocation truncated to fit: R_X86_64_PC32 against symbol `baz_' defined in COMMON section in /tmp/ccIkj6tt.o
But it works if I use
gfortran -O3 -mcmodel=medium -g -o test test.f
. Also note that it works if I make the array allocatable and allocate it within the subroutine.My question is what exactly does
-mcmodel=medium
do? I was under the impression that the two versions of the code (the one withallocatable
arrays and the one without) were more or less equivalent ...解决方案Since
bar
is quite large the compiler generates static allocation instead of automatic allocation on the stack. Static arrays are created with the.comm
assembly directive which creates an allocation in the so-called COMMON section. Symbols from that section are gathered, same-named symbols are merged (reduced to one symbol request with size equal to the largest size requested) and then what is rest is mapped to the BSS (uninitialised data) section in most executable formats. With ELF executables the.bss
section is located in the data segment, just before the data segment part of the heap (there is another heap part managed by anonymous memory mappings which does not reside in the data segment).With the
small
memory model 32-bit addressing instructions are used to address symbols on x86_64. This makes code smaller and also faster. Some assembly output when usingsmall
memory model:movl $bar.1535, %ebx <---- Instruction length saving ... movl %eax, baz_+4(%rip) <---- Problem!! ... .local bar.1535 .comm bar.1535,2575411200,32 ... .comm baz_,12,16
This uses a 32-bit move instruction (5 bytes long) to put the value of the
bar.1535
symbol (this value equals to the address of the symbol location) into the lower 32 bits of theRBX
register (the upper 32 bits get zeroed). Thebar.1535
symbol itself is allocated using the.comm
directive. Memory for thebaz
COMMON block is allocated afterwards. Becausebar.1535
is very large,baz_
ends up more than 2 GiB from the start of the.bss
section. This poses a problem in the secondmovl
instruction since a non-32bit (signed) offset fromRIP
should be used to address theb
variable where the value ofEAX
has to be moved into. This is only detected during link time. The assembler itself does not know the appropriate offset since it doesn't know what the value of the instruction pointer (RIP
) would be (it depends on the absolute virtual address where the code is loaded and this is determined by the linker), so it simply puts an offset of0
and then creates a relocation request of typeR_X86_64_PC32
. It instructs the linker to patch the value of0
with the real offset value. But it cannot do that since the offset value would not fit inside a signed 32-bit integer and hence bails out.With the
medium
memory model in place things look like this:movabsq $bar.1535, %r10 ... movl %eax, baz_+4(%rip) ... .local bar.1535 .largecomm bar.1535,2575411200,32 ... .comm baz_,12,16
First a 64-bit immediate move instruction (10 bytes long) is used to put the 64-bit value which represents the address of
bar.1535
into registerR10
. Memory for thebar.1535
symbol is allocated using the.largecomm
directive and thus it ends in the.lbss
section of the ELF exectuable..lbss
is used to store symbols which might not fit in the first 2 GiB (and hence should not be addressed using 32-bit instructions or RIP-relative addressing), while smaller things go to.bss
(baz_
is still allocated using.comm
and not.largecomm
). Since the.lbss
section is placed after the.bss
section in the ELF linker script,baz_
would not end up being inaccessible using 32-bit RIP-related addressing.All addressing modes are described in the System V ABI: AMD64 Architecture Processor Supplement. It is a heavy technical reading but a must read for anybody who really wants to understand how 64-bit code works on most x86_64 Unixes.
When an
ALLOCATABLE
array is used instead,gfortran
allocates heap memory (most likely implemented as an anonymous memory map given the large size of the allocation):movl $2575411200, %edi ... call malloc movq %rax, %rdi
This is basically
RDI = malloc(2575411200)
. From then on elements ofbar
are accessed by using positive offsets from the value stored inRDI
:movl 51190040(%rdi), %eax movl %eax, baz_+4(%rip)
For locations that are more than 2 GiB from the start of
bar
, a more elaborate method is used. E.g. to implementb = bar(12,144*144*450)
gfortran
emits:; Some computations that leave the offset in RAX movl (%rdi,%rax), %eax movl %eax, baz_+4(%rip)
This code is not affected by the memory model since nothing is assumed about the address where the dynamic allocation would be made. Also, since the array is not passed around, no descriptor is being built. If you add another function that takes an assumed-shaped array and pass
bar
to it, a descriptor forbar
is created as an automatic variable (i.e. on the stack offoo
). If the array is made static with theSAVE
attribute, the descriptor is placed in the.bss
section:movl $bar.1580, %edi ... ; RAX still holds the address of the allocated memory as returned by malloc ; Computations, computations movl -232(%rax,%rdx,4), %eax movl %eax, baz_+4(%rip)
The first move prepares the argument of a function call (in my sample case
call boo(bar)
whereboo
has an interface that declares it as taking an assumed-shape array). It moves the address of the array descriptor ofbar
intoEDI
. This is a 32-bit immediate move so the descriptor is expected to be in the first 2 GiB. Indeed, it is allocated in the.bss
in bothsmall
andmedium
memory models like this:.local bar.1580 .comm bar.1580,72,32
这篇关于gfortran for dummies:mcmodel = medium做了什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!