gfortran for dummies:mcmodel=medium 到底做了什么? [英] gfortran for dummies: What does mcmodel=medium do exactly?

查看:14
本文介绍了gfortran for dummies:mcmodel=medium 到底做了什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些代码在编译时给我重定位错误,下面是一个说明问题的示例:

I have some code that is giving me relocation errors when compiling, below is an example which illustrates the problem:

  program main
  common/baz/a,b,c
  real a,b,c
  b = 0.0
  call foo()
  print*, b
  end

  subroutine foo()
  common/baz/a,b,c
  real a,b,c

  integer, parameter :: nx = 450
  integer, parameter :: ny = 144
  integer, parameter :: nz = 144
  integer, parameter :: nf = 23*3
  real :: bar(nf,nx*ny*nz)

  !real, allocatable,dimension(:,:) :: bar
  !allocate(bar(nf,nx*ny*nz))

  bar = 1.0
  b = bar(12,32*138*42)

  return
  end

gfortran -O3 -g -o test test.f 编译,我得到以下错误:

Compiling this with gfortran -O3 -g -o test test.f, I get the following error:

relocation truncated to fit: R_X86_64_PC32 against symbol `baz_' defined in COMMON section in /tmp/ccIkj6tt.o

但如果我使用 gfortran -O3 -mcmodel=medium -g -o test test.f,它会起作用.另请注意,如果我使数组可分配并在子例程中分配它,它会起作用.

But it works if I use gfortran -O3 -mcmodel=medium -g -o test test.f. Also note that it works if I make the array allocatable and allocate it within the subroutine.

我的问题是 -mcmodel=medium 到底是做什么的?我的印象是代码的两个版本(一个带有 allocatable 数组的版本和一个不带数组的版本)或多或少是等效的...

My question is what exactly does -mcmodel=medium do? I was under the impression that the two versions of the code (the one with allocatable arrays and the one without) were more or less equivalent ...

推荐答案

由于 bar 相当大,编译器生成静态分配而不是堆栈上的自动分配.静态数组是使用 .comm 汇编指令创建的,该指令在所谓的 COMMON 部分中创建分配.收集该部分的符号,合并同名符号(减少到一个符号请求,其大小等于请求的最大大小),然后以大多数可执行格式将其余部分映射到 BSS(未初始化数据)部分.对于 ELF 可执行文件,.bss 部分位于数据段中,就在堆的数据段部分之前(还有另一个由不驻留在数据段中的匿名内存映射管理的堆部分).

Since bar is quite large the compiler generates static allocation instead of automatic allocation on the stack. Static arrays are created with the .comm assembly directive which creates an allocation in the so-called COMMON section. Symbols from that section are gathered, same-named symbols are merged (reduced to one symbol request with size equal to the largest size requested) and then what is rest is mapped to the BSS (uninitialised data) section in most executable formats. With ELF executables the .bss section is located in the data segment, just before the data segment part of the heap (there is another heap part managed by anonymous memory mappings which does not reside in the data segment).

对于 small 内存模型,32 位寻址指令用于寻址 x86_64 上的符号.这使得代码更小也更快.使用 small 内存模型时的一些汇编输出:

With the small memory model 32-bit addressing instructions are used to address symbols on x86_64. This makes code smaller and also faster. Some assembly output when using small memory model:

movl    $bar.1535, %ebx    <---- Instruction length saving
...
movl    %eax, baz_+4(%rip) <---- Problem!!
...
.local  bar.1535
.comm   bar.1535,2575411200,32
...
.comm   baz_,12,16

这使用 32 位移动指令(5 字节长)将 bar.1535 符号的值(该值等于符号位置的地址)放入低 32 位RBX 寄存器(高 32 位归零).bar.1535 符号本身是使用 .comm 指令分配的.baz COMMON 块的内存随后被分配.因为 bar.1535 非常大,所以 baz_.bss 部分的开头开始超过 2 GiB.这在第二个 movl 指令中造成了问题,因为应该使用来自 RIP 的非 32 位(有符号)偏移量来寻址 b 变量,其中EAX 的值必须移入.这仅在链接时检测到.汇编器本身不知道适当的偏移量,因为它不知道指令指针 (RIP) 的值是什么(它取决于加载代码的绝对虚拟地址,这是由链接器决定),所以它只是放置一个0的偏移量,然后创建一个R_X86_64_PC32类型的重定位请求.它指示链接器用实际偏移值修补 0 的值.但它不能这样做,因为偏移值不适合有符号的 32 位整数,因此会退出.

This uses a 32-bit move instruction (5 bytes long) to put the value of the bar.1535 symbol (this value equals to the address of the symbol location) into the lower 32 bits of the RBX register (the upper 32 bits get zeroed). The bar.1535 symbol itself is allocated using the .comm directive. Memory for the baz COMMON block is allocated afterwards. Because bar.1535 is very large, baz_ ends up more than 2 GiB from the start of the .bss section. This poses a problem in the second movl instruction since a non-32bit (signed) offset from RIP should be used to address the b variable where the value of EAX has to be moved into. This is only detected during link time. The assembler itself does not know the appropriate offset since it doesn't know what the value of the instruction pointer (RIP) would be (it depends on the absolute virtual address where the code is loaded and this is determined by the linker), so it simply puts an offset of 0 and then creates a relocation request of type R_X86_64_PC32. It instructs the linker to patch the value of 0 with the real offset value. But it cannot do that since the offset value would not fit inside a signed 32-bit integer and hence bails out.

有了 medium 内存模型,事情看起来像这样:

With the medium memory model in place things look like this:

movabsq $bar.1535, %r10
...
movl    %eax, baz_+4(%rip)
...
.local  bar.1535
.largecomm      bar.1535,2575411200,32
...
.comm   baz_,12,16

首先使用64位立即移动指令(10字节长)将代表bar.1535地址的64位值放入寄存器R10.bar.1535 符号的内存是使用 .largecomm 指令分配的,因此它在 ELF 可执行文件的 .lbss 部分结束..lbss 用于存储可能不适合前 2 GiB 的符号(因此不应使用 32 位指令或 RIP 相对寻址来寻址),而较小的东西则转到 .bss(baz_ 仍然使用 .comm 而不是 .largecomm 分配).由于 .lbss 部分位于 ELF 链接描述文件中的 .bss 部分之后,因此 baz_ 最终不会在使用 32 位时无法访问与 RIP 相关的寻址.

First a 64-bit immediate move instruction (10 bytes long) is used to put the 64-bit value which represents the address of bar.1535 into register R10. Memory for the bar.1535 symbol is allocated using the .largecomm directive and thus it ends in the .lbss section of the ELF exectuable. .lbss is used to store symbols which might not fit in the first 2 GiB (and hence should not be addressed using 32-bit instructions or RIP-relative addressing), while smaller things go to .bss (baz_ is still allocated using .comm and not .largecomm). Since the .lbss section is placed after the .bss section in the ELF linker script, baz_ would not end up being inaccessible using 32-bit RIP-related addressing.

System V ABI: AMD64 Architecture Processor Supplement.这是一本繁重的技术读物,但对于真正想了解 64 位代码如何在大多数 x86_64 Unix 上工作的任何人来说都是必读的.

All addressing modes are described in the System V ABI: AMD64 Architecture Processor Supplement. It is a heavy technical reading but a must read for anybody who really wants to understand how 64-bit code works on most x86_64 Unixes.

当使用 ALLOCATABLE 数组时,gfortran 分配堆内存(考虑到分配的大小,很可能实现为匿名内存映射):

When an ALLOCATABLE array is used instead, gfortran allocates heap memory (most likely implemented as an anonymous memory map given the large size of the allocation):

movl    $2575411200, %edi
...
call    malloc
movq    %rax, %rdi

这基本上是 RDI = malloc(2575411200).从那时起,bar 的元素通过使用 RDI 中存储的值的正偏移来访问:

This is basically RDI = malloc(2575411200). From then on elements of bar are accessed by using positive offsets from the value stored in RDI:

movl    51190040(%rdi), %eax
movl    %eax, baz_+4(%rip)

对于距离 bar 开头超过 2 GiB 的位置,使用更精细的方法.例如.实现 b = bar(12,144*144*450) gfortran 发出:

For locations that are more than 2 GiB from the start of bar, a more elaborate method is used. E.g. to implement b = bar(12,144*144*450) gfortran emits:

; Some computations that leave the offset in RAX
movl    (%rdi,%rax), %eax
movl    %eax, baz_+4(%rip)

此代码不受内存模型的影响,因为没有假设要进行动态分配的地址.此外,由于没有传递数组,因此没有构建描述符.如果您添加另一个函数,该函数采用假定形状的数组并将 bar 传递给它,则 bar 的描述符将创建为自动变量(即在 foo).如果使用 SAVE 属性将数组设为静态,则描述符将放置在 .bss 部分中:

This code is not affected by the memory model since nothing is assumed about the address where the dynamic allocation would be made. Also, since the array is not passed around, no descriptor is being built. If you add another function that takes an assumed-shaped array and pass bar to it, a descriptor for bar is created as an automatic variable (i.e. on the stack of foo). If the array is made static with the SAVE attribute, the descriptor is placed in the .bss section:

movl    $bar.1580, %edi
...
; RAX still holds the address of the allocated memory as returned by malloc
; Computations, computations
movl    -232(%rax,%rdx,4), %eax
movl    %eax, baz_+4(%rip)

第一步准备函数调用的参数(在我的示例中 call boo(bar) 其中 boo 有一个接口,它声明它采用假设-形状数组).它将 bar 的数组描述符的地址移动到 EDI 中.这是一个 32 位立即移动,因此描述符应该在前 2 GiB 中.实际上,它在 smallmedium 内存模型的 .bss 中分配,如下所示:

The first move prepares the argument of a function call (in my sample case call boo(bar) where boo has an interface that declares it as taking an assumed-shape array). It moves the address of the array descriptor of bar into EDI. This is a 32-bit immediate move so the descriptor is expected to be in the first 2 GiB. Indeed, it is allocated in the .bss in both small and medium memory models like this:

.local  bar.1580
.comm   bar.1580,72,32

这篇关于gfortran for dummies:mcmodel=medium 到底做了什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆