std ::数组与g ++的聚合初始化生成巨大的代码 [英] std::array with aggregate initialization on g++ generates huge code

查看：226 发布时间：2016/10/16 14:48:12 c++ optimization g++ stdarray loop-unrolling

本文介绍了std ::数组与g ++的聚合初始化生成巨大的代码的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在g ++ 4.9.2和5.3.1，此代码需要几秒钟才能编译并生成一个52,776字节的可执行文件：

On g++ 4.9.2 and 5.3.1, this code takes several seconds to compile and produces a 52,776 byte executable:

#include <array>
#include <iostream>

int main()
{
    constexpr std::size_t size = 4096;

    struct S
    {
        float f;
        S() : f(0.0f) {}
    };

    std::array<S, size> a = {};  // <-- note aggregate initialization

    for (auto& e : a)
        std::cerr << e.f;

    return 0;
}

增加以线性增加编译时间和可执行大小。我不能用clang 3.5或Visual C ++ 2015来重现这个行为。使用 -Os 没有什么区别。


$ time g++ -O2 -std=c++11 test.cpp
real    0m4.178s
user    0m4.060s
sys     0m0.068s

检查汇编代码显示 a 的初始化是展开，产生 4096   movl 说明：
Inspecting the assembly code reveals that the initialization of a is unrolled, generating 4096 movl instructions:
main:
.LFB1313:
    .cfi_startproc
    pushq   %rbx
    .cfi_def_cfa_offset 16
    .cfi_offset 3, -16
    subq    $16384, %rsp
    .cfi_def_cfa_offset 16400
    movl    $0x00000000, (%rsp)
    movl    $0x00000000, 4(%rsp)
    movq    %rsp, %rbx
    movl    $0x00000000, 8(%rsp)
    movl    $0x00000000, 12(%rsp)
    movl    $0x00000000, 16(%rsp)
       [...skipping 4000 lines...]
    movl    $0x00000000, 16376(%rsp)
    movl    $0x00000000, 16380(%rsp)

这只会发生在 T 有一个非平凡的构造函数，并且使用 {} 初始化数组。如果我做任何以下，g ++生成一个简单的循环：
This only happens when T has a non-trivial constructor and the array is initialized using {}. If I do any of the following, g++ generates a simple loop:
 
 删除 S :: S / code>; 
 
 删除 S :: S（）并初始化 S :: f  in-class; 
 
 删除聚合初始化（ = {} ）; 
 
 编译不含 -O2 。
 
 


Remove S::S();
Remove S::S() and initialize S::f in-class;
Remove the aggregate initialization (= {});
Compile without -O2.

所有的循环展开作为优化，但我不认为这是一个非常好的。 
I'm all for loop unrolling as an optimization, but I don't think this is a very good one. Before I report this as a bug, can someone confirm whether this is the expected behaviour?
 [edit：我已经打开了一个新的bug ，因为其他人似乎不匹配。 ] 
[edit: I've opened a new bug for this because the others don't seem to match. They were more about long compilation time than weird codegen.]
推荐答案
似乎有一个相关的错误报告， Bug 59659  - 大的零初始化std :: array编译时间过多。对于4.9.0，它被认为是固定的，所以我认为这个测试用例是一个回归或一个没有被补丁覆盖的edgecase。对于值得的，两个错误报告的测试用例^{1 ，< a href =https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59659#c2> 2}在我的GCC 4.9.0和5.3 .1 
There appears to be a related bug report, Bug 59659 - large zero-initialized std::array compile time excessive. It was considered "fixed" for 4.9.0, so I consider this testcase either a regression or an edgecase not covered by the patch. For what it's worth, two of the bug report's test cases^{1, 2} exhibit symptoms for me on both GCC 4.9.0 as well as 5.3.1
还有两个相关的错误报告：
There are two more related bug reports:
 错误68203  - 在带有嵌套数组的结构体上的无限编译时间-std = c ++ 11  
 
  Andrew Pinski 2015-11-04 07:56:57 UTC 

  Andrew Pinski 2015-11-04 07:56:57 UTC 
这很可能是一个记忆猪正在生成大量的默认
构造函数，而不是循环遍历它们。
This is most likely a memory hog which is generating lots of default
  constructors rather than a loop over them.
一个声称与此对象重复：
That one claims to be a duplicate of this one:
  Bug 56671  -  Gcc使用大量内存和处理器的能力与大型C ++ 11位元组 
 
  Jonathan Wakely 2016-01-26 15:12:27 UTC 

  Jonathan Wakely 2016-01-26 15:12:27 UTC
生成此constexpr构造函数的数组初始化是
问题：
Generating the array initialization for this constexpr constructor is
  the problem:
  constexpr _Base_bitset(unsigned long long __val) noexcept
  : _M_w{ _WordT(__val)
   } { }

 
 
确实，如果我们将它改为 S a [4096] {}; 遇到问题。
Indeed if we change it to S a[4096] {}; we don't get the problem.
使用 perf  GCC花费了大部分时间。第一个：
Using perf we can see where GCC is spending most of its time. First:
  perf record g ++ -std = c ++ 11 -O2 test.cpp  
然后 perf report ：
  10.33%  cc1plus   cc1plus                 [.] get_ref_base_and_extent
   6.36%  cc1plus   cc1plus                 [.] memrefs_conflict_p
   6.25%  cc1plus   cc1plus                 [.] vn_reference_lookup_2
   6.16%  cc1plus   cc1plus                 [.] exp_equiv_p
   5.99%  cc1plus   cc1plus                 [.] walk_non_aliased_vuses
   5.02%  cc1plus   cc1plus                 [.] find_base_term
   4.98%  cc1plus   cc1plus                 [.] invalidate
   4.73%  cc1plus   cc1plus                 [.] write_dependence_p
   4.68%  cc1plus   cc1plus                 [.] estimate_calls_size_and_time
   4.11%  cc1plus   cc1plus                 [.] ix86_find_base_term
   3.41%  cc1plus   cc1plus                 [.] rtx_equal_p
   2.87%  cc1plus   cc1plus                 [.] cse_insn
   2.77%  cc1plus   cc1plus                 [.] record_store
   2.66%  cc1plus   cc1plus                 [.] vn_reference_eq
   2.48%  cc1plus   cc1plus                 [.] operand_equal_p
   1.21%  cc1plus   cc1plus                 [.] integer_zerop
   1.00%  cc1plus   cc1plus                 [.] base_alias_check

这对于GCC开发人员来说意义不大，仍然很有趣的是看到这么多的编译时间。
This won't mean much to anyone but GCC developers but it's still interesting to see what's taking up so much compilation time.
 Clang 3.7.0在这个更好的工作比GCC。在 -O2 它需要不到一秒的编译，产生一个更小的可执行文件（8960字节）和这个程序集：
Clang 3.7.0 does a much better job at this than GCC. At -O2 it takes less than a second to compile, produces a much smaller executable (8960 bytes) and this assembly:
0000000000400810 <main>:
  400810:   53                      push   rbx
  400811:   48 81 ec 00 40 00 00    sub    rsp,0x4000
  400818:   48 8d 3c 24             lea    rdi,[rsp]
  40081c:   31 db                   xor    ebx,ebx
  40081e:   31 f6                   xor    esi,esi
  400820:   ba 00 40 00 00          mov    edx,0x4000
  400825:   e8 56 fe ff ff          call   400680 <memset@plt>
  40082a:   66 0f 1f 44 00 00       nop    WORD PTR [rax+rax*1+0x0]
  400830:   f3 0f 10 04 1c          movss  xmm0,DWORD PTR [rsp+rbx*1]
  400835:   f3 0f 5a c0             cvtss2sd xmm0,xmm0
  400839:   bf 60 10 60 00          mov    edi,0x601060
  40083e:   e8 9d fe ff ff          call   4006e0 <_ZNSo9_M_insertIdEERSoT_@plt>
  400843:   48 83 c3 04             add    rbx,0x4
  400847:   48 81 fb 00 40 00 00    cmp    rbx,0x4000
  40084e:   75 e0                   jne    400830 <main+0x20>
  400850:   31 c0                   xor    eax,eax
  400852:   48 81 c4 00 40 00 00    add    rsp,0x4000
  400859:   5b                      pop    rbx
  40085a:   c3                      ret    
  40085b:   0f 1f 44 00 00          nop    DWORD PTR [rax+rax*1+0x0]

另一方面使用GCC 5.3.1，没有优化，它编译非常快，但仍然产生一个95328大小的可执行文件。使用 -O2 编译会将可执行文件大小减少为53912，但编译时间需要4秒。我会肯定地向他们的bugzilla报告。
On the other hand with GCC 5.3.1, with no optimizations, it compiles very quickly but still produces a 95328 sized executable. Compiling with -O2 reduces the executable size to 53912 but compilation time takes 4 seconds. I would definitely report this to their bugzilla.

                        这篇关于std ::数组与g ++的聚合初始化生成巨大的代码的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

std ::数组与g ++的聚合初始化生成巨大的代码 [英] std::array with aggregate initialization on g++ generates huge code

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录关闭

std ::数组与g ++的聚合初始化生成巨大的代码 [英] std::array with aggregate initialization on g++ generates huge code

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录 关闭

登录关闭