gfortran for dummies:mcmodel = medium做了什么? [英] gfortran for dummies: What does mcmodel=medium do exactly?

查看:196
本文介绍了gfortran for dummies:mcmodel = medium做了什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些代码在编译时给我提供了重定位错误,下面是一个例子来说明问题:

 程序main 
common / baz / a,b,c
真实a,b,c
b = 0.0
调用foo()
print *,b
结束

子程序foo()
common / baz / a,b,c
实数a,b,c

整数,参数:: nx = 450
整数,参数:: ny = 144
整数,参数:: nz = 144
整数,参数:: nf = 23 * 3
real :: bar(nf ,nx * ny * nz)

!real,allocatable,dimension(:, :) :: bar
!allocate(bar(nf,nx * ny * nz))

bar = 1.0
b = bar(12,32 * 138 * 42)

返回
结束

使用 gfortran -O3 -g -o test test.f 编译它,我得到以下错误:

 重定位被截断为适合:在COMMON部分定义的符号`baz_'中的R_X86_64_PC32 / tmp / ccIkj6tt.o 

但是,如果我使用 gfortran -O3 -mcmodel = medium -g -o test test.f ,它会起作用。



我的问题是确切的 -mcmodel = medium 做?我的印象是两个版本的代码(包含 allocatable 数组的代码和没有的代码)大致相当......

解决方案

由于 bar 非常大,所以编译器在堆栈上生成静态分配而不是自动分配。静态数组由 .comm 程序集指令创建,该指令在所谓的COMMON部分中创建分配。收集来自该部分的符号,合并相同名称的符号(缩小为一个符号请求,其大小等于所请求的最大大小),然后什么是休息被映射到大多数可执行格式的BSS(未初始化数据)部分。对于ELF可执行文件, .bss 部分位于数据段中,就在堆的数据段部分之前(还有另一个由匿名内存映射管理的堆部分在$ 内存模型中使用32位寻址指令来寻址符号在x86_64上。这使得代码更小,速度更快。使用 small 内存模型时的一些程序集输出:

  movl $ bar。 1535,%ebx< ----指令长度节省
...
movl%eax,baz_ + 4(%rip)< ----问题!!
...
.local bar.1535
.comm bar.1535,2575411200,32
...
.comm baz_,12,16

这使用32位移动指令(5个字节长)将 bar.1535 符号(该值等于符号位置的地址)写入 RBX 寄存器的低32位(高32位被清零)。 bar.1535 符号本身是使用 .comm 指令分配的。之后分配 baz COMMON块的内存。因为 bar.1535 非常大,所以 baz _ 从<$ c $开始时结束超过2 GiB c> .bss 部分。这会在第二个 movl 指令中出现问题,因为应该使用来自 RIP 的非32位(带符号)偏移量解决必须将 EAX 的值移入的 b 变量。这仅在链接时间内检测到。汇编器本身并不知道适当的偏移量,因为它不知道指令指针( RIP )的值是什么(它取决于绝对虚拟地址,代码被加载并且由链接器确定),所以它只是简单地放置一个 0 的偏移量,然后创建一个类型为 R_X86_64_PC32 。它指示链接器将 0 的值修补为实际偏移值。但它不能做到这一点,因为偏移值不适合在一个有符号的32位整数内,因此会保留。



使用 code>内存模型就像这样:

  movabsq $ bar.1535,%r10 
...
movl%eax,baz_ + 4(%rip)
...
.local bar.1535
.largecomm bar.1535,2575411200,32
...
.comm baz_,12,16

首先是一个64位立即数移动指令(10字节长)用于将代表 bar.1535 地址的64位值放入寄存器 R10 。使用 .largecomm 指令分配 bar.1535 符号的内存,因此它以 .lbss ELF exectuable部分。 .lbss 用于存储可能不适合前2 GiB的符号(因此不应使用32位指令或RIP相对寻址进行寻址),而较小事情转到 .bss baz _ 仍然使用 .comm 而不是 .largecomm )。由于 .lbss 部分位于ELF链接描述文件中的 .bss 部分之后,所以 baz _ 最终不会因使用32位RIP相关寻址而无法访问。



所有寻址模式在 System V ABI:AMD64架构处理器补充。这是一个沉重的技术阅读,但必须读懂任何真正想了解64位代码如何在大多数x86_64 Unix上运行的人。



使用> ALLOCATABLE 数组, gfortran 分配堆内存(最有可能实现为匿名内存映射,因为分配的大小很大):

  movl $ 2575411200,%edi 
...
调用malloc
movq%rax,% rdi

基本上 RDI = malloc(2575411200)。从那时起,通过使用存储在 RDI 中的正偏移量访问 bar 元素:

  movl 51190040(%rdi),%eax 
movl%eax,baz_ + 4(%rip)

对于从 bar 开头超过2 GiB的位置,a使用更复杂的方法。例如。以实现 b = bar(12,144 * 144 * 450) gfortran 发出:

 ;一些计算会在RAX中保留偏移
movl(%rdi,%rax),%eax
movl%eax,baz_ + 4(%rip)

这个代码不受内存模型的影响,因为没有任何关于动态分配的地址。另外,由于该数组未被传递,因此不会构建任何描述符。如果添加另一个采用假设形状数组的函数并将 bar 传递给它,则创建 bar 的描述符作为自动变量(即在 foo 的堆栈上)。如果使用 SAVE 属性使数组变为静态,则描述符将放置在 .bss 部分中:

  movl $ bar.1580,%edi 
...
; RAX仍然保存由malloc
返回的分配内存的地址;计算,计算
movl -232(%rax,%rdx,4),%eax
movl%eax,baz_ + 4(%rip)
pre>

第一步准备函数调用的参数(在我的示例中 call boo(bar) where boo 有一个接口,声明它为假设形状数组)。它将 bar 数组描述符的地址移动到 EDI 中。这是一个32位立即移动,因此描述符预计会在前2个GiB中。实际上,它在中分配在 .bss $ c $>这样的内存模型:

  .local bar.1580 
.comm bar.1580,72,32


I have some code that is giving me relocation errors when compiling, below is an example which illustrates the problem:

  program main
  common/baz/a,b,c
  real a,b,c
  b = 0.0
  call foo()
  print*, b
  end

  subroutine foo()
  common/baz/a,b,c
  real a,b,c

  integer, parameter :: nx = 450
  integer, parameter :: ny = 144
  integer, parameter :: nz = 144
  integer, parameter :: nf = 23*3
  real :: bar(nf,nx*ny*nz)

  !real, allocatable,dimension(:,:) :: bar
  !allocate(bar(nf,nx*ny*nz))

  bar = 1.0
  b = bar(12,32*138*42)

  return
  end

Compiling this with gfortran -O3 -g -o test test.f, I get the following error:

relocation truncated to fit: R_X86_64_PC32 against symbol `baz_' defined in COMMON section in /tmp/ccIkj6tt.o

But it works if I use gfortran -O3 -mcmodel=medium -g -o test test.f. Also note that it works if I make the array allocatable and allocate it within the subroutine.

My question is what exactly does -mcmodel=medium do? I was under the impression that the two versions of the code (the one with allocatable arrays and the one without) were more or less equivalent ...

解决方案

Since bar is quite large the compiler generates static allocation instead of automatic allocation on the stack. Static arrays are created with the .comm assembly directive which creates an allocation in the so-called COMMON section. Symbols from that section are gathered, same-named symbols are merged (reduced to one symbol request with size equal to the largest size requested) and then what is rest is mapped to the BSS (uninitialised data) section in most executable formats. With ELF executables the .bss section is located in the data segment, just before the data segment part of the heap (there is another heap part managed by anonymous memory mappings which does not reside in the data segment).

With the small memory model 32-bit addressing instructions are used to address symbols on x86_64. This makes code smaller and also faster. Some assembly output when using small memory model:

movl    $bar.1535, %ebx    <---- Instruction length saving
...
movl    %eax, baz_+4(%rip) <---- Problem!!
...
.local  bar.1535
.comm   bar.1535,2575411200,32
...
.comm   baz_,12,16

This uses a 32-bit move instruction (5 bytes long) to put the value of the bar.1535 symbol (this value equals to the address of the symbol location) into the lower 32 bits of the RBX register (the upper 32 bits get zeroed). The bar.1535 symbol itself is allocated using the .comm directive. Memory for the baz COMMON block is allocated afterwards. Because bar.1535 is very large, baz_ ends up more than 2 GiB from the start of the .bss section. This poses a problem in the second movl instruction since a non-32bit (signed) offset from RIP should be used to address the b variable where the value of EAX has to be moved into. This is only detected during link time. The assembler itself does not know the appropriate offset since it doesn't know what the value of the instruction pointer (RIP) would be (it depends on the absolute virtual address where the code is loaded and this is determined by the linker), so it simply puts an offset of 0 and then creates a relocation request of type R_X86_64_PC32. It instructs the linker to patch the value of 0 with the real offset value. But it cannot do that since the offset value would not fit inside a signed 32-bit integer and hence bails out.

With the medium memory model in place things look like this:

movabsq $bar.1535, %r10
...
movl    %eax, baz_+4(%rip)
...
.local  bar.1535
.largecomm      bar.1535,2575411200,32
...
.comm   baz_,12,16

First a 64-bit immediate move instruction (10 bytes long) is used to put the 64-bit value which represents the address of bar.1535 into register R10. Memory for the bar.1535 symbol is allocated using the .largecomm directive and thus it ends in the .lbss section of the ELF exectuable. .lbss is used to store symbols which might not fit in the first 2 GiB (and hence should not be addressed using 32-bit instructions or RIP-relative addressing), while smaller things go to .bss (baz_ is still allocated using .comm and not .largecomm). Since the .lbss section is placed after the .bss section in the ELF linker script, baz_ would not end up being inaccessible using 32-bit RIP-related addressing.

All addressing modes are described in the System V ABI: AMD64 Architecture Processor Supplement. It is a heavy technical reading but a must read for anybody who really wants to understand how 64-bit code works on most x86_64 Unixes.

When an ALLOCATABLE array is used instead, gfortran allocates heap memory (most likely implemented as an anonymous memory map given the large size of the allocation):

movl    $2575411200, %edi
...
call    malloc
movq    %rax, %rdi

This is basically RDI = malloc(2575411200). From then on elements of bar are accessed by using positive offsets from the value stored in RDI:

movl    51190040(%rdi), %eax
movl    %eax, baz_+4(%rip)

For locations that are more than 2 GiB from the start of bar, a more elaborate method is used. E.g. to implement b = bar(12,144*144*450) gfortran emits:

; Some computations that leave the offset in RAX
movl    (%rdi,%rax), %eax
movl    %eax, baz_+4(%rip)

This code is not affected by the memory model since nothing is assumed about the address where the dynamic allocation would be made. Also, since the array is not passed around, no descriptor is being built. If you add another function that takes an assumed-shaped array and pass bar to it, a descriptor for bar is created as an automatic variable (i.e. on the stack of foo). If the array is made static with the SAVE attribute, the descriptor is placed in the .bss section:

movl    $bar.1580, %edi
...
; RAX still holds the address of the allocated memory as returned by malloc
; Computations, computations
movl    -232(%rax,%rdx,4), %eax
movl    %eax, baz_+4(%rip)

The first move prepares the argument of a function call (in my sample case call boo(bar) where boo has an interface that declares it as taking an assumed-shape array). It moves the address of the array descriptor of bar into EDI. This is a 32-bit immediate move so the descriptor is expected to be in the first 2 GiB. Indeed, it is allocated in the .bss in both small and medium memory models like this:

.local  bar.1580
.comm   bar.1580,72,32

这篇关于gfortran for dummies:mcmodel = medium做了什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆