为什么要将全局偏移表用于共享库本身中定义的符号? [英] Why use the Global Offset Table for symbols defined in the shared library itself?

查看:83
本文介绍了为什么要将全局偏移表用于共享库本身中定义的符号?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

考虑以下简单的共享库源代码:

library.cpp:

 static int global = 10;

int foo()
{
    return global;
}
 

使用clang中的-fPIC选项进行编译,它将导致该对象组合件(x86-64):

foo(): # @foo()
  push rbp
  mov rbp, rsp
  mov eax, dword ptr [rip + global]
  pop rbp
  ret
global:
  .long 10 # 0xa

由于符号是在库中定义的,因此编译器将按预期使用PC相对地址:mov eax, dword ptr [rip + global]

但是,如果我们将static int global = 10;更改为int global = 10;,使其成为具有外部链接的符号,则结果汇编为:

foo(): # @foo()
  push rbp
  mov rbp, rsp
  mov rax, qword ptr [rip + global@GOTPCREL]
  mov eax, dword ptr [rax]
  pop rbp
  ret
global:
  .long 10 # 0xa

如您所见,编译器在全局偏移表中添加了一个间接层,在这种情况下,由于符号仍在同一库(和源文件)中定义,因此似乎完全没有必要.

如果符号是在另一个共享库中定义的,则GOT是必需的,但是在这种情况下,它会显得多余.为何编译器仍将此符号添加到GOT?

注意:我相信这个问题是与此类似,但是答案可能不适当,可能是由于缺乏细节.

解决方案

全局偏移表有两个作用.一种是允许动态链接器插入"与可执行文件或其他共享对象不同的变量定义.第二个方法是允许生成位置无关的代码,以引用某些处理器体系结构上的变量.

ELF动态链接将整个过程,可执行文件和所有共享对象(动态库)视为共享一个全局名称空间.如果多个组件(可执行或共享对象)定义了相同的全局符号,则动态链接程序通常选择该符号的一个定义,并且所有组件中对该符号的所有引用都引用该一个定义. (但是,ELF动态符号解析很复杂,由于各种原因,不同的组件最终可能会使用同一全局符号的不同定义.)

要实现此目的,在构建共享库时,编译器将通过GOT间接访问全局变量.对于每个变量,将在GOT中创建一个条目,其中包含指向该变量的指针.如您的示例代码所示,编译器将使用该条目获取变量的地址,而不是尝试直接访问它.当共享对象加载到进程中时,动态链接器将确定是否任何全局变量已被另一个组件中的变量定义所取代.如果是这样,这些全局变量将更新其GOT条目,以指向取代的变量.

通过使用隐藏"或受保护"的ELF可见性属性,可以防止全局定义的符号被另一个组件中的定义所取代,从而消除了在某些体系结构上使用GOT的需要.例如:

extern int global_visible;
extern int global_hidden __attribute__((visibility("hidden")));
static volatile int local;  // volatile, so it's not optimized away

int
foo() {
    return global_visible + global_hidden + local;
}

当使用-O3 -fPIC和GCC的x86_64端口进行编译时,会生成:

foo():
        mov     rcx, QWORD PTR global_visible@GOTPCREL[rip]
        mov     edx, DWORD PTR local[rip]
        mov     eax, DWORD PTR global_hidden[rip]
        add     eax, DWORD PTR [rcx]
        add     eax, edx
        ret 

如您所见,只有global_visible使用GOT,global_hiddenlocal不使用GOT. 受保护的"可见性的工作原理类似,它阻止了定义的取代,但仍对动态链接器可见,因此可以被其他组件访问. 隐藏"可见性完全隐藏了动态链接器中的符号.

为了使共享对象可以在不同的进程中加载​​到不同的地址而使代码可重定位的必要性意味着,静态分配的变量(无论是全局范围还是局部范围)在大多数情况下都无法直接用一条指令直接访问建筑.如上所示,我知道的唯一例外是64位x86体系结构.它支持与PC相对的内存操作数,并且具有较大的32位位移,可以达到在同一组件中定义的任何变量.

在所有其他架构上,我熟悉以依赖位置的方式访问变量,这需要多条指令.各个架构的精确度差异很大,但通常涉及使用GOT.例如,如果使用-m32 -O3 -fPIC选项使用GCC的x86_64端口编译上面的示例C代码,则会得到:

foo():
        call    __x86.get_pc_thunk.dx
        add     edx, OFFSET FLAT:_GLOBAL_OFFSET_TABLE_
        push    ebx
        mov     ebx, DWORD PTR global_visible@GOT[edx]
        mov     ecx, DWORD PTR local@GOTOFF[edx]
        mov     eax, DWORD PTR global_hidden@GOTOFF[edx]
        add     eax, DWORD PTR [ebx]
        pop     ebx
        add     eax, ecx
        ret
__x86.get_pc_thunk.dx:
        mov     edx, DWORD PTR [esp]
        ret

GOT用于所有三个变量访问,但是如果仔细观察,global_hiddenlocal的处理方式与global_visible不同.对于后面的变量,可以通过GOT访问指向该变量的指针,对于前两个变量,可以直接通过GOT访问它们.在所有位置独立变量引用都使用GOT的体系结构中,这是一个相当普遍的技巧.

32位x86体系结构在这里是一种例外,因为它具有较大的32位位移和32位地址空间.这意味着可以通过GOT基本访问存储器中的任何地方,而不仅仅是GOT本身.大多数其他架构仅支持较小的位移,这使得距GOT基础的最大距离要小得多.其他使用此技巧的体系结构只会将小的(本地/隐藏/受保护的)变量放在GOT本身中,大的变量将存储在GOT之外,并且GOT将包含指向变量的指针,就像普通可见性的全局变量一样.

Consider the following simple shared library source code:

library.cpp:

static int global = 10;

int foo()
{
    return global;
}

Compiled with -fPIC option in clang, it results in this object assembly (x86-64):

foo(): # @foo()
  push rbp
  mov rbp, rsp
  mov eax, dword ptr [rip + global]
  pop rbp
  ret
global:
  .long 10 # 0xa

Since the symbol is defined inside the library, the compiler is using a PC relative addressing as expected: mov eax, dword ptr [rip + global]

However if we change static int global = 10; to int global = 10; making it a symbol with external linkage, the resulting assembly is:

foo(): # @foo()
  push rbp
  mov rbp, rsp
  mov rax, qword ptr [rip + global@GOTPCREL]
  mov eax, dword ptr [rax]
  pop rbp
  ret
global:
  .long 10 # 0xa

As you can see the compiler added a layer of indirection with the Global Offset Table, which seems totally unnecessary in this case as the symbol is still defined inside the same library (and source file).

If the symbol was defined in another shared library, the GOT would be necessary, but in this case it feels redundant. Why is the compiler still adding this symbol to the GOT?

Note: I believe this question is similiar to this, however the answer was not pertinent maybe due to a lack of details.

解决方案

The Global Offset Table serves two purposes. One is to allow the dynamic linker "interpose" a different definition of the variable from the executable or other shared object. The second is to allow position independent code to be generated for references to variables on certain processor architectures.

ELF dynamic linking treats the entire process, the executable and all of the shared objects (dynamic libraries), as sharing one single global namespace. If multiple components (executable or shared objects) define the same global symbol then the dynamic linker normally chooses one definition of that symbol and all references to that symbol in all components refer to that one definition. (However, the ELF dynamic symbol resolution is complex and for various reasons different components can end up using different definitions of the the same global symbol.)

To implement this, when building a shared library the compiler will access global variables indirectly through the GOT. For each variable an entry in the GOT will be created containing a pointer to the variable. As your example code shows, the compiler will then use this entry to obtain the address of variable instead of trying to access it directly. When the shared object is loaded into a process the dynamic linker will determine whether any of the global variables have been superseded by variable definitions in another component. If so those global variables will have their GOT entries updated to point at the superseding variable.

By using the "hidden" or "protected" ELF visibility attributes it's possible to prevent global defined symbol from being superseded by a definition in another component, and thus removing the need to use the GOT on certain architectures. For example:

extern int global_visible;
extern int global_hidden __attribute__((visibility("hidden")));
static volatile int local;  // volatile, so it's not optimized away

int
foo() {
    return global_visible + global_hidden + local;
}

when compiled with -O3 -fPIC with the x86_64 port of GCC generates:

foo():
        mov     rcx, QWORD PTR global_visible@GOTPCREL[rip]
        mov     edx, DWORD PTR local[rip]
        mov     eax, DWORD PTR global_hidden[rip]
        add     eax, DWORD PTR [rcx]
        add     eax, edx
        ret 

As you can see, only global_visible uses the GOT, global_hidden and local don't use it. The "protected" visibility works similarly, it prevents the definition from being superseded but makes it still visible to the dynamic linker so it can be accessed by other components. The "hidden" visibility hides the symbol completely from the dynamic linker.

The necessity of making code relocatable in order allow shared objects to be loaded a different addresses in different process means that statically allocated variables, whether they have global or local scope, can't be accessed with directly with a single instruction on most architectures. The only exception I know of is the 64-bit x86 architecture, as you see above. It supports memory operands that are both PC-relative and have large 32-bit displacements that can reach any variable defined in the same component.

On all the other architectures I'm familiar with accessing variables in position dependent manner requires multiple instructions. How exactly varies greatly by architecture, but it often involves using the GOT. For example, if you compile the example C code above with x86_64 port of GCC using the -m32 -O3 -fPIC options you get:

foo():
        call    __x86.get_pc_thunk.dx
        add     edx, OFFSET FLAT:_GLOBAL_OFFSET_TABLE_
        push    ebx
        mov     ebx, DWORD PTR global_visible@GOT[edx]
        mov     ecx, DWORD PTR local@GOTOFF[edx]
        mov     eax, DWORD PTR global_hidden@GOTOFF[edx]
        add     eax, DWORD PTR [ebx]
        pop     ebx
        add     eax, ecx
        ret
__x86.get_pc_thunk.dx:
        mov     edx, DWORD PTR [esp]
        ret

The GOT is used for all three variable accesses, but if you look closely global_hidden and local are handled differently than global_visible. With the later, a pointer to the variable is accessed through the GOT, with former two variables they're accessed directly through the GOT. This a fairly common trick among architectures where the GOT is used for all position independent variable references.

The 32-bit x86 architecture is exceptional in one way here, since it has large 32-bit displacements and a 32-bit address space. This means that anywhere in memory can be accessed through the GOT base, not just the GOT itself. Most other architectures only support much smaller displacements, which makes the maximum distance something can be from the GOT base much smaller. Other architectures that use this trick will only put small (local/hidden/protected) variables in the GOT itself, large variables are stored outside the GOT and the GOT will contain a pointer to the variable just like with normal visibility global variables.

这篇关于为什么要将全局偏移表用于共享库本身中定义的符号?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆