链接器可以内联函数吗? [英] Can the linker inline functions?

查看:22
本文介绍了链接器可以内联函数吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在文件 file1.c 中,调用了在文件 file2.c 中实现的函数.当我将 file1.ofile2.o 链接成一个可执行文件时,如果 file2 中的函数很小,链接器会自动检测到函数很小并且内联了它的调用?

In the file file1.c, there is a call to a function that is implemented in the file file2.c. When I link file1.o and file2.o into an executable, if the function in file2 is very small, will the linker automatically detect that the function is small and inline its call?

推荐答案

除了 Jame McNellis 提到的对链接时间代码生成 (LTCG) 的支持之外,GCC 工具链还支持链接时间优化.从 4.5 版开始,GCC 支持 -flto 开关,该开关启用链接时间优化 (LTO),这是一种整个程序优化的形式,可以从单独的目标文件中内联函数(以及编译器可能进行的任何其他优化)如果它正在编译所有目标文件,就好像它们来自单个 C 源文件一样).

In addition to the support for Link Time Code Generation (LTCG) that Jame McNellis mentioned, the GCC toolchain also supports link time optimization. Starting with version 4.5, GCC supports the -flto switch which enables Link Time Optimization (LTO), a form of whole program optimization that lets it inline functions from separate object files (and whatever other optimizations a compiler might be able to make if it were compiling all the object files as if they were from a single C source file).

这是一个简单的例子:

test.c:

void print_int(int x);

int main(){
    print_int(1);
    print_int(42);
    print_int(-1);

    return 0;
}

print_int.c:

#include <stdio.h>

void print_int( int x)
{
    printf( "the int is %d
", x);
}

首先使用 GCC4.5.x 编译它们 - GCC 文档中的示例使用 -O2,但是为了在我的简单测试中获得可见的结果,我不得不使用 -O3:

First compile them using GCC4.5.x - examples from GCC docs use -O2, but to get visible results in my simple test, I had to use -O3:

C:	emp>gcc --version
gcc (GCC) 4.5.2

# compile with preparation for LTO
C:	emp>gcc -c -O3 -flto test.c
C:	emp>gcc -c -O3 -flto print_int.c

# link without LTO
C:	emp>gcc -o test-nolto.exe  print_int.o test.o

为了获得 LTO 的效果,您甚至应该在链接阶段使用优化选项 - 链接器实际上会调用编译器来编译编译器在上述第一步中放入目标文件的中间代码片段.如果您在此阶段也没有传递优化选项,编译器将不会执行您要查找的内联.

To get the effect of LTO you're supposed to use the optimization options even at the link stage - the linker actually invokes the compiler to compile pieces of intermediate code that the compiler put into the object file in the first steps above. If you don't pass the optimization option at this stage as well, the compiler won't perform the inlining that you'd be looking for.

# link using LTO
C:	emp>gcc -o test-lto.exe -flto -O3 print_int.o test.o

没有链接时间优化的版本反汇编.请注意,调用是对 print_int() 函数进行的:

Disassembly of the version without link time optimization. Note that the calls are made to the print_int() function:

C:	emp>gdb test-nolto.exe
GNU gdb (GDB) 7.2
(gdb) start
Temporary breakpoint 1 at 0x401373
Starting program: C:	emp/test-nolto.exe
[New Thread 3324.0xdc0]

Temporary breakpoint 1, 0x00401373 in main ()
(gdb) disassem
Dump of assembler code for function main:
   0x00401370 <+0>:     push   %ebp
   0x00401371 <+1>:     mov    %esp,%ebp
=> 0x00401373 <+3>:     and    $0xfffffff0,%esp
   0x00401376 <+6>:     sub    $0x10,%esp
   0x00401379 <+9>:     call   0x4018ca <__main>
   0x0040137e <+14>:    movl   $0x1,(%esp)
   0x00401385 <+21>:    call   0x401350 <print_int>
   0x0040138a <+26>:    movl   $0x2a,(%esp)
   0x00401391 <+33>:    call   0x401350 <print_int>
   0x00401396 <+38>:    movl   $0xffffffff,(%esp)
   0x0040139d <+45>:    call   0x401350 <print_int>
   0x004013a2 <+50>:    xor    %eax,%eax
   0x004013a4 <+52>:    leave
   0x004013a5 <+53>:    ret

具有链接时间优化的版本反汇编.请注意,对 printf() 的调用是直接进行的:

Disassembly of the version with link time optimization. Note that the calls to printf() are made directly:

C:	emp>gdb test-lto.exe

GNU gdb (GDB) 7.2
(gdb) start
Temporary breakpoint 1 at 0x401373
Starting program: C:	emp/test-lto.exe
[New Thread 1768.0x126c]

Temporary breakpoint 1, 0x00401373 in main ()
(gdb) disassem
Dump of assembler code for function main:
   0x00401370 <+0>:     push   %ebp
   0x00401371 <+1>:     mov    %esp,%ebp
=> 0x00401373 <+3>:     and    $0xfffffff0,%esp
   0x00401376 <+6>:     sub    $0x10,%esp
   0x00401379 <+9>:     call   0x4018da <__main>
   0x0040137e <+14>:    movl   $0x1,0x4(%esp)
   0x00401386 <+22>:    movl   $0x403064,(%esp)
   0x0040138d <+29>:    call   0x401acc <printf>
   0x00401392 <+34>:    movl   $0x2a,0x4(%esp)
   0x0040139a <+42>:    movl   $0x403064,(%esp)
   0x004013a1 <+49>:    call   0x401acc <printf>
   0x004013a6 <+54>:    movl   $0xffffffff,0x4(%esp)
   0x004013ae <+62>:    movl   $0x403064,(%esp)
   0x004013b5 <+69>:    call   0x401acc <printf>
   0x004013ba <+74>:    xor    %eax,%eax
   0x004013bc <+76>:    leave
   0x004013bd <+77>:    ret
End of assembler dump.

这里是使用 MSVC 进行的相同实验(首先使用 LTCG):

And here's the same experiment with MSVC (first with LTCG):

C:	emp>cl -c /GL /Zi /Ox test.c
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 16.00.40219.01 for 80x86
Copyright (C) Microsoft Corporation.  All rights reserved.

test.c

C:	emp>cl -c /GL /Zi /Ox print_int.c
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 16.00.40219.01 for 80x86
Copyright (C) Microsoft Corporation.  All rights reserved.

print_int.c

C:	emp>link /LTCG test.obj print_int.obj /out:test-ltcg.exe /debug
Microsoft (R) Incremental Linker Version 10.00.40219.01
Copyright (C) Microsoft Corporation.  All rights reserved.

Generating code
Finished generating code

C:	emp>"Program Files (x86)Debugging Tools for Windows (x86)"cdb test-ltcg.exe

Microsoft (R) Windows Debugger Version 6.12.0002.633 X86
Copyright (c) Microsoft Corporation. All rights reserved.

CommandLine: test-ltcg.exe
    // ...
0:000> u main
*** WARNING: Unable to verify checksum for test-ltcg.exe
test_ltcg!main:
00cd1c20 6a01            push    1
00cd1c22 68d05dcd00      push    offset test_ltcg!__decimal_point_length+0x10 (00cd5dd0)
00cd1c27 e8e3f3feff      call    test_ltcg!printf (00cc100f)
00cd1c2c 6a2a            push    2Ah
00cd1c2e 68d05dcd00      push    offset test_ltcg!__decimal_point_length+0x10 (00cd5dd0)
00cd1c33 e8d7f3feff      call    test_ltcg!printf (00cc100f)
00cd1c38 6aff            push    0FFFFFFFFh
00cd1c3a 68d05dcd00      push    offset test_ltcg!__decimal_point_length+0x10 (00cd5dd0)
00cd1c3f e8cbf3feff      call    test_ltcg!printf (00cc100f)
00cd1c44 83c418          add     esp,18h
00cd1c47 33c0            xor     eax,eax
00cd1c49 c3              ret
0:000>

现在没有 LTCG.请注意,使用 MSVC,您必须编译不带 /GL 的 .c 文件以防止链接器执行 LTCG - 否则链接器检测到指定了 /GL,并且将强制使用 /LTCG 选项(嘿,这就是你说你第一次使用 /GL 时想要的):

Now without LTCG. Note that with MSVC you have to compile the .c file without the /GL to prevent the linker from performing LTCG - otherwise the linker detects that /GL was specified, and it'll force the /LTCG option (hey, that's what you said you wanted the first time around with /GL):

C:	emp>cl -c /Zi /Ox test.c
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 16.00.40219.01 for 80x86
Copyright (C) Microsoft Corporation.  All rights reserved.

test.c

C:	emp>cl -c /Zi /Ox print_int.c
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 16.00.40219.01 for 80x86
Copyright (C) Microsoft Corporation.  All rights reserved.

print_int.c

C:	emp>link test.obj print_int.obj /out:test-noltcg.exe /debug
Microsoft (R) Incremental Linker Version 10.00.40219.01
Copyright (C) Microsoft Corporation.  All rights reserved.

C:	emp>"Program Files (x86)Debugging Tools for Windows (x86)"cdb test-noltcg.exe

Microsoft (R) Windows Debugger Version 6.12.0002.633 X86
Copyright (c) Microsoft Corporation. All rights reserved.

CommandLine: test-noltcg.exe
// ...
0:000> u main
test_noltcg!main:
00c41020 6a01            push    1
00c41022 e8e3ffffff      call    test_noltcg!ILT+5(_print_int) (00c4100a)
00c41027 6a2a            push    2Ah
00c41029 e8dcffffff      call    test_noltcg!ILT+5(_print_int) (00c4100a)
00c4102e 6aff            push    0FFFFFFFFh
00c41030 e8d5ffffff      call    test_noltcg!ILT+5(_print_int) (00c4100a)
00c41035 83c40c          add     esp,0Ch
00c41038 33c0            xor     eax,eax
00c4103a c3              ret
0:000>

Microsoft 的链接器在 LTCG 中支持的一件事 GCC 不支持(据我所知) 是配置文件引导优化 (PGO).该技术允许 Microsoft 的链接器根据从程序先前运行中收集的分析数据进行优化.这允许链接器执行一些操作,例如将热"函数收集到相同的内存页面上,并将很少使用的代码序列收集到其他内存页面上,以减少程序的工作集.

One thing that Microsoft's linker supports in LTCG that is not supported by GCC (as far as I know) is Profile Guided Optimization (PGO). That technology allows Microsoft's linker to optimize based on a profiling data gathered from previous runs of the program. This allows the linker to do things such as gather 'hot' functions onto the same memory pages and seldom used code sequences onto other memory pages to reduce the working set of a program.

 

编辑(2011 年 8 月 28 日):GCC 使用诸如 -fprofile-generate-fprofile-use 之类的选项支持配置文件引导的优化,但我完全不了解

Edit (28 Aug 2011): GCC support profile guided optimization using such options as -fprofile-generate and -fprofile-use, but I'm completely uninformed about them.

感谢康拉德·鲁道夫向我指出这一点.

Thanks to Konrad Rudolph for pointing this out to me.

这篇关于链接器可以内联函数吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆