GCC编译非常慢(大文件) [英] GCC compilation very slow (large file)

查看:268
本文介绍了GCC编译非常慢(大文件)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试编译一个大的C文件(特别是对于MATLAB混合). C文件约为20 MB(可从GCC错误跟踪器中找到 如果您想玩它的话.

I am trying to compile a large C file (specifically for MATLAB mexing). The C file is around 20 MB (available from the GCC bug tracker if you want to play around with it).

这是我正在运行的命令,并且输出到屏幕,如下所示.它已经运行了几个小时,并且您可以看到,优化已被禁用(-O0).为什么这么慢?有什么办法可以让我更快?

Here is the command I am running and the output to screen, below. This has been running for hours, and as you can see, optimization is already disabled (-O0). Why is this so slow? Is there a way I can make this faster?

(供参考:Ubuntu 12.04(精确的穿山甲)64位和GCC 4.7.3)

(For reference: Ubuntu 12.04 (Precise Pangolin) 64 bit and GCC 4.7.3)

/usr/bin/gcc -c -DMX_COMPAT_32   -D_GNU_SOURCE -DMATLAB_MEX_FILE  -I"/usr/local/MATLAB/R2015a/extern/include" -I"/usr/local/MATLAB/R2015a/simulink/include" -ansi -fexceptions -fPIC -fno-omit-frame-pointer -pthread -O0 -DNDEBUG path/to/test4.c -o /tmp/mex_198714460457975_3922/test4.o -v
Using built-in specs.
COLLECT_GCC=/usr/bin/gcc
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu/Linaro 4.7.3-2ubuntu1~12.04' --with-bugurl=file:///usr/share/doc/gcc-4.7/README.Bugs --enable-languages=c,c++,go,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-4.7 --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.7 --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-gnu-unique-object --enable-plugin --with-system-zlib --enable-objc-gc --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64 --with-tune=generic --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 4.7.3 (Ubuntu/Linaro 4.7.3-2ubuntu1~12.04)
COLLECT_GCC_OPTIONS='-c' '-D' 'MX_COMPAT_32' '-D' '_GNU_SOURCE' '-D' 'MATLAB_MEX_FILE' '-I' '/usr/local/MATLAB/R2015a/extern/include' '-I' '/usr/local/MATLAB/R2015a/simulink/include' '-ansi' '-fexceptions' '-fPIC' '-fno-omit-frame-pointer' '-pthread' '-O0' '-D' 'NDEBUG' '-o' '/tmp/mex_198714460457975_3922/test4.o' '-v' '-mtune=generic' '-march=x86-64'
 /usr/lib/gcc/x86_64-linux-gnu/4.7/cc1 -quiet -v -I /usr/local/MATLAB/R2015a/extern/include -I /usr/local/MATLAB/R2015a/simulink/include -imultilib . -imultiarch x86_64-linux-gnu -D_REENTRANT -D MX_COMPAT_32 -D _GNU_SOURCE -D MATLAB_MEX_FILE -D NDEBUG path/to/test4.c -quiet -dumpbase test4.c -mtune=generic -march=x86-64 -auxbase-strip /tmp/mex_198714460457975_3922/test4.o -O0 -ansi -version -fexceptions -fPIC -fno-omit-frame-pointer -fstack-protector -o /tmp/ccxDOA5f.s
GNU C (Ubuntu/Linaro 4.7.3-2ubuntu1~12.04) version 4.7.3 (x86_64-linux-gnu)
    compiled by GNU C version 4.7.3, GMP version 5.0.2, MPFR version 3.1.0-p3, MPC version 0.9
GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
ignoring nonexistent directory "/usr/local/include/x86_64-linux-gnu"
ignoring nonexistent directory "/usr/lib/gcc/x86_64-linux-gnu/4.7/../../../../x86_64-linux-gnu/include"
#include "..." search starts here:
#include <...> search starts here:
 /usr/local/MATLAB/R2015a/extern/include
 /usr/local/MATLAB/R2015a/simulink/include
 /usr/lib/gcc/x86_64-linux-gnu/4.7/include
 /usr/local/include
 /usr/lib/gcc/x86_64-linux-gnu/4.7/include-fixed
 /usr/include/x86_64-linux-gnu
 /usr/include
End of search list.
GNU C (Ubuntu/Linaro 4.7.3-2ubuntu1~12.04) version 4.7.3 (x86_64-linux-gnu)
    compiled by GNU C version 4.7.3, GMP version 5.0.2, MPFR version 3.1.0-p3, MPC version 0.9
GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
Compiler executable checksum: c119948b394d79ea05b6b3986ab084cf

后续工作:我遵循chqrlie的建议,并且tcc在不到5秒钟的时间内编译了我的函数(我只需要删除-ansi标志并将"gcc"转换为"tcc"),这非常了不起,真的.我只能想象GCC的复杂性.

a follow-on: I followed chqrlie's advice and tcc compiled my function in <5 seconds (I had to remove the -ansi flag only and turn "gcc" to "tcc"), which is pretty remarkable, really. I can only imagine the complexity of GCC.

但是,当尝试进行混合时,通常还需要另外一个命令.第二个命令通常是:

When trying to then mex it, however, there is one other command mex typically needs. The second command is typically:

/usr/bin/gcc -pthread -Wl,--no-undefined -Wl,-rpath-link,/usr/local/MATLAB/R2015a/bin/glnxa64 -shared  -O -Wl,--version-script,"/usr/local/MATLAB/R2015a/extern/lib/glnxa64/mexFunction.map" /tmp/mex_61853296369424_4031/test4.o   -L"/usr/local/MATLAB/R2015a/bin/glnxa64" -lmx -lmex -lmat -lm -lstdc++ -o test4.mexa64

我无法使用tcc运行此命令,因为其中一些标志不兼容.如果我尝试使用GCC运行第二个编译步骤,则会得到:

I cannot run this with tcc as some of these flags are not compatible. If I try to run this second compilation step with GCC, I get:

/usr/bin/ld: test4.o: relocation R_X86_64_PC32 against undefined symbol `mxGetPr' can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: final link failed: Bad value
collect2: error: ld returned 1 exit status

解决方案似乎是lang. tcc可以编译文件,但是在混合的第二步中的参数与tcc的参数选项不兼容. Clang速度非常快,并且会生成一个漂亮的,小型的,优化的文件.

The solution appears to be clang. tcc can compile the file, but the arguments in the second step in mexing are incompatible with tcc's argument options. Clang is very fast and produces a nice, small, optimized file.

推荐答案

在测试中,我发现Clang编译器似乎在编译大文件方面的问题较少.尽管Clang在编译过程中消耗了将近1 GB的内存,但它成功地将OP的源代码形式转换为70 kB的目标文件.这适用于我测试过的所有优化级别.

Upon testing, I found that the Clang compiler seems to have less problems compiling large files. Although Clang consumed almost a gigabyte of memory during compilation, it successfully turned OP's source code form into a 70 kB object file. This works for all optimization levels I tested.

gcc还能够快速编译该文件,并且在打开优化功能的情况下也不会占用太多内存.此 gcc中的错误来自OPs代码中的大表达式,该表达式将寄存器分配器负担沉重.启用优化后,编译器将执行称为 common子表达式消除的优化,该优化可从OPs代码中删除大量冗余,从而将编译时间和目标文件大小都减小到可管理的值.

gcc was also able to compile this file quickly and without consuming too much memory if optimization is turned on. This bug in gcc comes from the large expression in OPs code which places a huge burden on the register allocator. With optimizations turned on, the compiler performs an optimization called common subexpression elimination which is able to remove a lot of redundancy from OPs code, reducing both compilation time and object file size to manageable values.

以下是来自上述错误报告的一些测试用例的测试:

Here are some tests with the testcase from the aforementioned bug report:

$ time gcc5 -O3 -c -o testcase.gcc5-O3.o testcase.c
real    0m39,30s
user    0m37,85s
sys     0m1,42s
$ time gcc5 -O0 -c -o testcase.gcc5-O0.o testcase.c
real    23m33,34s
user    23m27,07s
sys     0m5,92s
$ time tcc -c -o testcase.tcc.o testcase.c
real    0m2,60s
user    0m2,42s
sys     0m0,17s
$ time clang -O3 -c -o testcase.clang-O3.o testcase.c
real    0m13,71s
user    0m12,55s
sys     0m1,16s
$ time clang -O0 -c -o testcase.clang-O0.o testcase.c
real    0m17,63s
user    0m16,14s
sys     0m1,49s
$ time clang -Os -c -o testcase.clang-Os.o testcase.c
real    0m14,88s
user    0m13,73s
sys 0m1,11s
$ time clang -Oz -c -o testcase.clang-Oz.o testcase.c
real    0m13,56s
user    0m12,45s
sys     0m1,09

这是目标文件的大小:

    text       data     bss      dec        hex filename
39101286          0       0 39101286    254a366 testcase.clang-O0.o
   72161          0       0    72161      119e1 testcase.clang-O3.o
   72087          0       0    72087      11997 testcase.clang-Os.o
   72087          0       0    72087      11997 testcase.clang-Oz.o
38683240          0       0 38683240    24e4268 testcase.gcc5-O0.o
   87500          0       0    87500      155cc testcase.gcc5-O3.o
   78239          0       0    78239      1319f testcase.gcc5-Os.o
69210504    3170616       0 72381120    45072c0 testcase.tcc.o

这篇关于GCC编译非常慢(大文件)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆