力/说服/诀窍GCC为开卷_Longer_循环? [英] Force/Convince/Trick GCC into Unrolling _Longer_ Loops?

查看:184
本文介绍了力/说服/诀窍GCC为开卷_Longer_循环?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我如何说服GCC解开一个循环,迭代的次数是已知的,但大?

How do I convince GCC to unroll a loop where the number of iterations is known, but large?

我与 -O3 编译。

问题的真正code是比较复杂的,当然,但这里有一个熬浓的例子,具有相同的行为:

The real code in question is more complex, of course, but here's a boiled-down example that has the same behavior:

int const constants[] = { 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121, 144 };

int get_sum_1()
{
    int total = 0;
    for (int i = 0; i < CONSTANT_COUNT; ++i)
    {
        total += constants[i];
    }
    return total;
}

...如果 CONSTANT_COUNT 被定义为8(或更少),然后GCC将展开循环,传播常数,然后向下降低整个功能,一个简单的返回&lt;值取代; 。如果,另一方面, CONSTANT_COUNT 是9(或更高),那么循环尚未解开,和GCC产生哪些循环二进制,读取常量,并且在将它们相加运行时 - 即使在理论上,功能仍然可以向下优化以刚刚返回一个常数。 (是的,我看了看反编译的二进制文件。)

...if CONSTANT_COUNT is defined as 8 (or less) then GCC will unroll the loop, propagate the constants, and reduce the entire function down to a simple return <value>;. If, on the other hand, CONSTANT_COUNT is 9 (or greater) then the loop is not unrolled, and GCC produces a binary which loops, reads the constants, and adds them at run-time - even though, in theory, the function could still be optimized down to just returning a constant. (Yes, I've looked at the decompiled the binary.)

如果我的手动的展开循环,像这样的:

If I manually unroll the loop, like this:

int get_sum_2()
{
    int total = 0;
    total += constants[0];
    total += constants[1];
    total += constants[2];
    total += constants[3];
    total += constants[4];
    total += constants[5];
    total += constants[6];
    total += constants[7];
    total += constants[8];
    //total += constants[9];
    return total;
}

或者这样:

#define ADD_CONSTANT(z, v, c) total += constants[v];

int get_sum_2()
{
    int total = 0;
    BOOST_PP_REPEAT(CONSTANT_COUNT, ADD_CONSTANT, _)
    return total;
}

...那么函数优化到返回一个常数。因此,GCC似乎是能够处理较大的回路,一旦展开的恒定传播;挂断似乎刚开GCC考虑首先展开较长的循环。

...then the function is optimized down to returning a constant. So, GCC appears to be able to handle the constant propagation for larger loops, once unrolled; the hang-up appears to be just getting GCC to consider unrolling the longer loop in the first place.

然而,无论是手展开,也不 BOOST_PP_REPEAT 都是可行的选择,因为有部分的情况下, CONSTANT_COUNT 是一个运行时前pression和的相同的code仍然需要对这些情况下正常工作。 (性能不是在这种情况下同样重要。)

However, neither hand-unrolling nor BOOST_PP_REPEAT are viable options, because there are some cases where CONSTANT_COUNT is a run-time expression, and the same code still needs to work correctly for those cases. (Performance isn't as critical in those cases.)

我在C(不是C ++),所以没有模板元编程,也不 constexpr 是提供给我的工作。

I'm working in C (not C++) so neither template meta-programming nor constexpr are available to me.

我试过 -funroll-循环 -funroll-全循环 -fpeel-循环,以及 MAX-展开-的insn , MAX-平均展开-的insn MAX-解开次最大去皮-的insn MAX-剥离倍 MAX-完全去皮-的insn MAX-完全剥离倍,其中没有一个似乎有所作为。

I've tried -funroll-loops, -funroll-all-loops, -fpeel-loops, and setting large values for max-unrolled-insns, max-average-unrolled-insns, max-unroll-times, max-peeled-insns, max-peel-times, max-completely-peeled-insns, and max-completely-peel-times, none of which seem to make a difference.

我使用GCC 4.8.2,在Linux上,x86_64的。

I'm using GCC 4.8.2, on Linux, x86_64.

任何想法?是否有一个标志或参数我失踪...?

Any ideas? Is there a flag or parameter I'm missing...?

推荐答案

我不知道,如果这种解决方法适用于您的实际问题,但我发现,在一个x86_64的运行GCC 4.9.0 20140604($ P $租赁前)抛物线的GNU / Linux解开下面的循环直至并包括 CONSTANT_COUNT == 33

I'm not sure if this workaround is applicable to your actual problem but I found that GCC 4.9.0 20140604 (prerelease) on an x86_64 running Parabola GNU/Linux unrolls the following loop up to and including CONSTANT_COUNT == 33.

int
get_sum()
{
  int total = 0;
  int i, j, k = 0;
  for (j = 0; j < 2; ++j)
    {
      for (i = 0; i < CONSTANT_COUNT / 2; ++i)
        {
          total += constants[k++];
        }
    }
  if (CONSTANT_COUNT % 2)
    total += constants[k];
  return total;
}

我只通过它 -O3 标记。组装code为 get_sum 真的只是

I have only passed it the -O3 flag. The assembly code for get_sum is really just

movl $561, %eax
ret

我没有尝试,但也许该模式可进一步延长。

I did not try but maybe the pattern can be extended even further.

这似乎很奇怪,我认为这工作,因为 - 至少我的人的眼睛 - 在code看起来要复杂得多了。不幸的是,它是一个相当侵入性的方式来强制展开。编译器的标志将是更好。

It seems odd to me that this works since – at least to my human eyes – the code looks much more complex now. Unfortunately, it is a rather intrusive way to force the unrolling. A compiler flag would have been much nicer.

这篇关于力/说服/诀窍GCC为开卷_Longer_循环?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆