std ::数组与g ++的聚合初始化生成巨大的代码 [英] std::array with aggregate initialization on g++ generates huge code
问题描述
在g ++ 4.9.2和5.3.1,此代码需要几秒钟才能编译并生成一个52,776字节的可执行文件:
On g++ 4.9.2 and 5.3.1, this code takes several seconds to compile and produces a 52,776 byte executable:
#include <array>
#include <iostream>
int main()
{
constexpr std::size_t size = 4096;
struct S
{
float f;
S() : f(0.0f) {}
};
std::array<S, size> a = {}; // <-- note aggregate initialization
for (auto& e : a)
std::cerr << e.f;
return 0;
}
增加以线性增加编译时间和可执行大小。我不能用clang 3.5或Visual C ++ 2015来重现这个行为。使用
-Os
没有什么区别。
$ time g++ -O2 -std=c++11 test.cpp
real 0m4.178s
user 0m4.060s
sys 0m0.068s
检查汇编代码显示 a
的初始化是展开,产生 4096 movl
说明:
Inspecting the assembly code reveals that the initialization of a
is unrolled, generating 4096 movl
instructions:
main:
.LFB1313:
.cfi_startproc
pushq %rbx
.cfi_def_cfa_offset 16
.cfi_offset 3, -16
subq $16384, %rsp
.cfi_def_cfa_offset 16400
movl $0x00000000, (%rsp)
movl $0x00000000, 4(%rsp)
movq %rsp, %rbx
movl $0x00000000, 8(%rsp)
movl $0x00000000, 12(%rsp)
movl $0x00000000, 16(%rsp)
[...skipping 4000 lines...]
movl $0x00000000, 16376(%rsp)
movl $0x00000000, 16380(%rsp)
这只会发生在 T
有一个非平凡的构造函数,并且使用 {}
初始化数组。如果我做任何以下,g ++生成一个简单的循环:
This only happens when T
has a non-trivial constructor and the array is initialized using {}
. If I do any of the following, g++ generates a simple loop:
- 删除
S :: S / code>;
- 删除
S :: S()
并初始化S :: f
in-class; - 删除聚合初始化(
= {}
); - 编译不含
-O2
。
- Remove
S::S()
; - Remove
S::S()
and initializeS::f
in-class; - Remove the aggregate initialization (
= {}
); - Compile without
-O2
.
所有的循环展开作为优化,但我不认为这是一个非常好的。
I'm all for loop unrolling as an optimization, but I don't think this is a very good one. Before I report this as a bug, can someone confirm whether this is the expected behaviour?
[edit:我已经打开了一个新的bug ,因为其他人似乎不匹配。 ]
[edit: I've opened a new bug for this because the others don't seem to match. They were more about long compilation time than weird codegen.]
推荐答案
似乎有一个相关的错误报告, Bug 59659 - 大的零初始化std :: array编译时间过多。对于4.9.0,它被认为是固定的,所以我认为这个测试用例是一个回归或一个没有被补丁覆盖的edgecase。对于值得的,两个错误报告的测试用例 1 ,< a href =https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59659#c2> 2 在我的GCC 4.9.0和5.3 .1
There appears to be a related bug report, Bug 59659 - large zero-initialized std::array compile time excessive. It was considered "fixed" for 4.9.0, so I consider this testcase either a regression or an edgecase not covered by the patch. For what it's worth, two of the bug report's test cases1, 2 exhibit symptoms for me on both GCC 4.9.0 as well as 5.3.1
还有两个相关的错误报告:
There are two more related bug reports:
错误68203 - 在带有嵌套数组的结构体上的无限编译时间-std = c ++ 11
Andrew Pinski 2015-11-04 07:56:57 UTC
Andrew Pinski 2015-11-04 07:56:57 UTC
这很可能是一个记忆猪正在生成大量的默认
构造函数,而不是循环遍历它们。
This is most likely a memory hog which is generating lots of default constructors rather than a loop over them.
一个声称与此对象重复:
That one claims to be a duplicate of this one:
Bug 56671 - Gcc使用大量内存和处理器的能力与大型C ++ 11位元组
Jonathan Wakely 2016-01-26 15:12:27 UTC
Jonathan Wakely 2016-01-26 15:12:27 UTC
生成此constexpr构造函数的数组初始化是
问题:
Generating the array initialization for this constexpr constructor is the problem:
constexpr _Base_bitset(unsigned long long __val) noexcept
: _M_w{ _WordT(__val)
} { }
确实,如果我们将它改为 S a [4096] {};
遇到问题。
Indeed if we change it to S a[4096] {};
we don't get the problem.
使用 perf
GCC花费了大部分时间。第一个:
Using perf
we can see where GCC is spending most of its time. First:
perf record g ++ -std = c ++ 11 -O2 test.cpp
然后 perf report
:
10.33% cc1plus cc1plus [.] get_ref_base_and_extent
6.36% cc1plus cc1plus [.] memrefs_conflict_p
6.25% cc1plus cc1plus [.] vn_reference_lookup_2
6.16% cc1plus cc1plus [.] exp_equiv_p
5.99% cc1plus cc1plus [.] walk_non_aliased_vuses
5.02% cc1plus cc1plus [.] find_base_term
4.98% cc1plus cc1plus [.] invalidate
4.73% cc1plus cc1plus [.] write_dependence_p
4.68% cc1plus cc1plus [.] estimate_calls_size_and_time
4.11% cc1plus cc1plus [.] ix86_find_base_term
3.41% cc1plus cc1plus [.] rtx_equal_p
2.87% cc1plus cc1plus [.] cse_insn
2.77% cc1plus cc1plus [.] record_store
2.66% cc1plus cc1plus [.] vn_reference_eq
2.48% cc1plus cc1plus [.] operand_equal_p
1.21% cc1plus cc1plus [.] integer_zerop
1.00% cc1plus cc1plus [.] base_alias_check
这对于GCC开发人员来说意义不大,仍然很有趣的是看到这么多的编译时间。
This won't mean much to anyone but GCC developers but it's still interesting to see what's taking up so much compilation time.
Clang 3.7.0在这个更好的工作比GCC。在 -O2
它需要不到一秒的编译,产生一个更小的可执行文件(8960字节)和这个程序集:
Clang 3.7.0 does a much better job at this than GCC. At -O2
it takes less than a second to compile, produces a much smaller executable (8960 bytes) and this assembly:
0000000000400810 <main>:
400810: 53 push rbx
400811: 48 81 ec 00 40 00 00 sub rsp,0x4000
400818: 48 8d 3c 24 lea rdi,[rsp]
40081c: 31 db xor ebx,ebx
40081e: 31 f6 xor esi,esi
400820: ba 00 40 00 00 mov edx,0x4000
400825: e8 56 fe ff ff call 400680 <memset@plt>
40082a: 66 0f 1f 44 00 00 nop WORD PTR [rax+rax*1+0x0]
400830: f3 0f 10 04 1c movss xmm0,DWORD PTR [rsp+rbx*1]
400835: f3 0f 5a c0 cvtss2sd xmm0,xmm0
400839: bf 60 10 60 00 mov edi,0x601060
40083e: e8 9d fe ff ff call 4006e0 <_ZNSo9_M_insertIdEERSoT_@plt>
400843: 48 83 c3 04 add rbx,0x4
400847: 48 81 fb 00 40 00 00 cmp rbx,0x4000
40084e: 75 e0 jne 400830 <main+0x20>
400850: 31 c0 xor eax,eax
400852: 48 81 c4 00 40 00 00 add rsp,0x4000
400859: 5b pop rbx
40085a: c3 ret
40085b: 0f 1f 44 00 00 nop DWORD PTR [rax+rax*1+0x0]
另一方面使用GCC 5.3.1,没有优化,它编译非常快,但仍然产生一个95328大小的可执行文件。使用 -O2
编译会将可执行文件大小减少为53912,但编译时间需要4秒。我会肯定地向他们的bugzilla报告。
On the other hand with GCC 5.3.1, with no optimizations, it compiles very quickly but still produces a 95328 sized executable. Compiling with -O2
reduces the executable size to 53912 but compilation time takes 4 seconds. I would definitely report this to their bugzilla.
这篇关于std ::数组与g ++的聚合初始化生成巨大的代码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!