GCC中的循环展开行为 [英] Loop unrolling behaviour in GCC

查看:348
本文介绍了GCC中的循环展开行为的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

该问题部分是 GCC 5.1循环展开的后续问题。



根据 GCC文档,正如我对上述问题的回答所述, -funroll-loops 之类的标志会打开完全循环剥离(即完成删除迭代次数很少的循环)。因此,启用此类标志后,如果编译器确定可以优化给定代码段的执行,则可以选择展开循环。



不过,我在我的一个项目中注意到,即使未启用相关标志,有时GCC也会展开循环。例如,考虑以下简单的代码段:

  int main(int argc,char ** argv)
{
int k = 0;
for(k = 0; k <5; ++ k)
{
volatile int temp = k;
}
}

使用 -O1编译时,将展开循环,并使用任何现代版本的GCC生成以下汇编代码:

  main:
movl $ 0,-4(%rsp)
movl $ 1,-4(%rsp)
movl $ 2,-4(%rsp)
movl $ 3,-4 (%rsp)
movl $ 4,-4(%rsp)
movl $ 0,%eax
ret

即使使用其他 -fno-unroll-loops -fno-peel-loops 进行编译以确保标记为



此观察结果使我想到了以下密切相关的问题。为什么即使禁用了与该行为相对应的标志,GCC仍会执行循环展开?即使禁用了 -funroll-loops ,展开操作是否也受其他标志的控制,在某些情况下,这些标志会使编译器展开循环?有没有一种方法可以完全禁用GCC中的循环展开(使用 -O0 进行编译的一部分)?



Clang 编译器在这里具有预期的行为,并且似乎仅在启用 -funroll-loops 时执行展开,而在其他情况下则不执行。 / p>

在此先感谢您对此问题的任何见解!

解决方案


即使禁用了与该行为相对应的标志
,GCC为什么仍执行循环展开?


从务实的角度考虑:将此类标志传递给编译器时需要什么?没有C ++开发人员会要求GCC展开或不展开循环,只是为了有循环而没有汇编代码,才有目标。例如,如果您正在开发嵌入式软件,则 -fno-unroll-loops 的目标是牺牲一点速度以减小二进制文件的大小。有限的存储空间。另一方面, -funrool-loops 的目标是告诉编译器您不在乎二进制文件的大小,因此不要犹豫展开循环。



但这并不意味着编译器会盲目展开或不是所有循环!



在您的示例中,原因很简单:循环仅包含一个指令-在任何平台上都只有几个字节-并且编译器知道是可以忽略的,并且无论如何都将使用与循环所需的汇编代码几乎相同的大小( sub + mov + jne 在x86-64上)。



这就是为什么gcc 6.2和 -O3- fno-unroll-loops 将此代码变为:

  int mul(int k,int j)
{
for(int i = 0; i< 5; ++ i)
volatile int k = j;

返回k;
}

...到以下汇编代码:

  mul(int,int):
mov DWORD PTR [rsp-0x4],esi
mov eax,edi
mov DWORD PTR [rsp-0x4],esi
mov DWORD PTR [rsp-0x4],esi
mov DWORD PTR [rsp-0x4],esi
mov DWORD PTR [rsp-0x4] ,esi
ret

它不听你的话,因为它会(几乎取决于架构)不会更改二进制文件的大小,但速度更快。但是,如果您增加循环计数器...

  int mul(int k,int j)
{
for(int i = 0; i< 20; ++ i)
volatile int k = j;

返回k;
}

...它遵循您的提示:

  mul(int,int):
mov eax,edi
mov edx,0x14
nop WORD PTR [rax + rax * 1 + 0x0]
sub edx,0x1
mov DWORD PTR [rsp-0x4],esi
jne 400520< mul(int,int)+ 0x10>
repz ret

如果将循环计数器保持在<$,您将得到相同的行为。 c $ c> 5 ,但是您将一些代码添加到循环中。



总而言之,请将所有这些优化标志视为 hint (针对编译器),并从务实的开发人员角度出发。总是要权衡取舍,并且在构建软件时,您从不希望请求 all no 循环展开。



作为最后的说明,另一个非常相似的示例是 -f(no-)inline-functions 标志。我每天都在争取编译器内联(或不进行内联!)某些功能(使用 inline 关键字和 __ attribute__((noinline))和GCC),当我检查汇编代码时,我发现当我想内联一个绝对不能满足其需求的函数时,此smartass有时仍会做它想要的。而且大多数时候,这是正确的做法,我很高兴!


This question is in part a follow up question to GCC 5.1 Loop unrolling.

According to the GCC documentation, and as stated in my answer to the above question, flags such as -funroll-loops turn on "complete loop peeling (i.e. complete removal of loops with a small constant number of iterations)". Therefore, when such a flag is enabled, the compiler can choose to unroll a loop if it determines that this would optimise the execution of a given piece of code.

Nevertheless, I noticed in one of my projects that GCC would sometimes unroll loops even though the relevant flags were not enabled. For instance, consider the following simple piece of code:

int main(int argc, char **argv)
{
  int k = 0;
  for( k = 0; k < 5; ++k )
  {
    volatile int temp = k;
  }
}

When compiling with -O1, the loop is unrolled and the following assembly code is generated with any modern version of GCC:

main:
        movl    $0, -4(%rsp)
        movl    $1, -4(%rsp)
        movl    $2, -4(%rsp)
        movl    $3, -4(%rsp)
        movl    $4, -4(%rsp)
        movl    $0, %eax
        ret

Even when compiling with the additional -fno-unroll-loops -fno-peel-loops to make sure the flags are disabled, GCC unexpectedly still performs loop unrolling on the example described above.

This observation leads me to the following closely related questions. Why does GCC perform loop unrolling even though the flags corresponding to this behaviour are disabled? Is unrolling also controlled by other flags which can make the compiler unroll a loop in some cases even though -funroll-loops is disabled? Is there a way to completely disable loop unrolling in GCC (a part from compiling with -O0)?

Interestingly the Clang compiler has the expected behaviour here, and seems to only perform unrolling when -funroll-loops is enabled, and not in other cases.

Thanks in advance, any additional insights on this matter would be greatly appreciated!

解决方案

Why does GCC perform loop unrolling even though the flags corresponding to this behaviour are disabled?

Think of it from a pragmatic view: what do you want when passing such flag to the compiler? No C++ developer will ask GCC to unroll or not unroll loops, just for the sake of having loops or not in assembly code, there is a goal. The goal with -fno-unroll-loops is, for example, to sacrifice a bit of speed in order to reduce the size of your binary, if you are developing an embedded software with limited storage. On the other hand, the goal with -funrool-loops is to tell the compiler that you do not care about the size of you binary, so it should not hesitate to unroll loops.

But that does not mean that the compiler will blindly unroll or not all your loops!

In your example, the reason is simple: the loop contains only one instruction - few bytes on any platforms - and the compiler knows that this is negligeable and will anyway take almost the same size as the assembly code needed for the loop (sub + mov + jne on x86-64).

This is why gcc 6.2, with -O3 -fno-unroll-loops turns this code:

int mul(int k, int j) 
{   
  for (int i = 0; i < 5; ++i)
    volatile int k = j;

  return k; 
}

... to the following assembly code:

 mul(int, int):
  mov    DWORD PTR [rsp-0x4],esi
  mov    eax,edi
  mov    DWORD PTR [rsp-0x4],esi
  mov    DWORD PTR [rsp-0x4],esi
  mov    DWORD PTR [rsp-0x4],esi
  mov    DWORD PTR [rsp-0x4],esi  
  ret    

It does not listen to you because it would (almost, depending on the architecture) not change the size of the binary but it is faster. However, if you increase a bit your loop counter...

int mul(int k, int j) 
{   
  for (int i = 0; i < 20; ++i)
    volatile int k = j;

  return k; 
}

... it follows your hint:

 mul(int, int):
  mov    eax,edi
  mov    edx,0x14
  nop    WORD PTR [rax+rax*1+0x0]
  sub    edx,0x1
  mov    DWORD PTR [rsp-0x4],esi
  jne    400520 <mul(int, int)+0x10>
  repz ret 

You will get the same behavior if you keep your loop counter at 5 but you add some code into the loop.

To sum up, think of all these optimization flags as a hint for the compiler, and from a pragmatic developer point of view. It is always a trade-off, and when you build a software, you never want to ask for all or no loop unrolling.

As a final note, another very similar example is the -f(no-)inline-functions flag. I am fighting every day the compiler to inline (or not!) some of my functions (with the inline keyword and __attribute__ ((noinline)) with GCC), and when I check the assembly code, I see that this smartass is still doing sometimes what it wants, when I want to inline a function that is definitely too long for its taste. And most of the time, it is the right thing to do and I am happy!

这篇关于GCC中的循环展开行为的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆