力/说服/诀窍GCC为开卷_Longer_循环? [英] Force/Convince/Trick GCC into Unrolling _Longer_ Loops?
问题描述
我如何说服GCC解开一个循环,迭代的次数是已知的,但大?
How do I convince GCC to unroll a loop where the number of iterations is known, but large?
我与 -O3
编译。
问题的真正code是比较复杂的,当然,但这里有一个熬浓的例子,具有相同的行为:
The real code in question is more complex, of course, but here's a boiled-down example that has the same behavior:
int const constants[] = { 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121, 144 };
int get_sum_1()
{
int total = 0;
for (int i = 0; i < CONSTANT_COUNT; ++i)
{
total += constants[i];
}
return total;
}
...如果 CONSTANT_COUNT
被定义为8(或更少),然后GCC将展开循环,传播常数,然后向下降低整个功能,一个简单的返回&lt;值取代;
。如果,另一方面, CONSTANT_COUNT
是9(或更高),那么循环尚未解开,和GCC产生哪些循环二进制,读取常量,并且在将它们相加运行时 - 即使在理论上,功能仍然可以向下优化以刚刚返回一个常数。 (是的,我看了看反编译的二进制文件。)
...if CONSTANT_COUNT
is defined as 8 (or less) then GCC will unroll the loop, propagate the constants, and reduce the entire function down to a simple return <value>;
. If, on the other hand, CONSTANT_COUNT
is 9 (or greater) then the loop is not unrolled, and GCC produces a binary which loops, reads the constants, and adds them at run-time - even though, in theory, the function could still be optimized down to just returning a constant. (Yes, I've looked at the decompiled the binary.)
如果我的手动的展开循环,像这样的:
If I manually unroll the loop, like this:
int get_sum_2()
{
int total = 0;
total += constants[0];
total += constants[1];
total += constants[2];
total += constants[3];
total += constants[4];
total += constants[5];
total += constants[6];
total += constants[7];
total += constants[8];
//total += constants[9];
return total;
}
或者这样:
#define ADD_CONSTANT(z, v, c) total += constants[v];
int get_sum_2()
{
int total = 0;
BOOST_PP_REPEAT(CONSTANT_COUNT, ADD_CONSTANT, _)
return total;
}
...那么函数优化到返回一个常数。因此,GCC似乎是能够处理较大的回路,一旦展开的恒定传播;挂断似乎刚开GCC考虑首先展开较长的循环。
...then the function is optimized down to returning a constant. So, GCC appears to be able to handle the constant propagation for larger loops, once unrolled; the hang-up appears to be just getting GCC to consider unrolling the longer loop in the first place.
然而,无论是手展开,也不 BOOST_PP_REPEAT
都是可行的选择,因为有部分的情况下, CONSTANT_COUNT
是一个运行时前pression和的相同的code仍然需要对这些情况下正常工作。 (性能不是在这种情况下同样重要。)
However, neither hand-unrolling nor BOOST_PP_REPEAT
are viable options, because there are some cases where CONSTANT_COUNT
is a run-time expression, and the same code still needs to work correctly for those cases. (Performance isn't as critical in those cases.)
我在C(不是C ++),所以没有模板元编程,也不 constexpr
是提供给我的工作。
I'm working in C (not C++) so neither template meta-programming nor constexpr
are available to me.
我试过 -funroll-循环
, -funroll-全循环
, -fpeel-循环
,以及 MAX-展开-的insn
, MAX-平均展开-的insn
, MAX-解开次
,最大去皮-的insn
, MAX-剥离倍
, MAX-完全去皮-的insn
和 MAX-完全剥离倍
,其中没有一个似乎有所作为。
I've tried -funroll-loops
, -funroll-all-loops
, -fpeel-loops
, and setting large values for max-unrolled-insns
, max-average-unrolled-insns
, max-unroll-times
, max-peeled-insns
, max-peel-times
, max-completely-peeled-insns
, and max-completely-peel-times
, none of which seem to make a difference.
我使用GCC 4.8.2,在Linux上,x86_64的。
I'm using GCC 4.8.2, on Linux, x86_64.
任何想法?是否有一个标志或参数我失踪...?
Any ideas? Is there a flag or parameter I'm missing...?
推荐答案
我不知道,如果这种解决方法适用于您的实际问题,但我发现,在一个x86_64的运行GCC 4.9.0 20140604($ P $租赁前)抛物线的GNU / Linux解开下面的循环直至并包括 CONSTANT_COUNT == 33
。
I'm not sure if this workaround is applicable to your actual problem but I found that GCC 4.9.0 20140604 (prerelease) on an x86_64 running Parabola GNU/Linux unrolls the following loop up to and including CONSTANT_COUNT == 33
.
int
get_sum()
{
int total = 0;
int i, j, k = 0;
for (j = 0; j < 2; ++j)
{
for (i = 0; i < CONSTANT_COUNT / 2; ++i)
{
total += constants[k++];
}
}
if (CONSTANT_COUNT % 2)
total += constants[k];
return total;
}
我只通过它 -O3
标记。组装code为 get_sum
真的只是
I have only passed it the -O3
flag. The assembly code for get_sum
is really just
movl $561, %eax
ret
我没有尝试,但也许该模式可进一步延长。
I did not try but maybe the pattern can be extended even further.
这似乎很奇怪,我认为这工作,因为 - 至少我的人的眼睛 - 在code看起来要复杂得多了。不幸的是,它是一个相当侵入性的方式来强制展开。编译器的标志将是更好。
It seems odd to me that this works since – at least to my human eyes – the code looks much more complex now. Unfortunately, it is a rather intrusive way to force the unrolling. A compiler flag would have been much nicer.
这篇关于力/说服/诀窍GCC为开卷_Longer_循环?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!