GCC 5.1循环展开 [英] GCC 5.1 Loop unrolling

查看:792
本文介绍了GCC 5.1循环展开的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

由于以下code

#include <stdio.h>

int main(int argc, char **argv)
{
  int k = 0;
  for( k = 0; k < 20; ++k )
  {
    printf( "%d\n", k ) ;
  }
}

使用GCC 5.1或更高版本。

Using GCC 5.1 or later with

-x c -std=c99 -O3 -funroll-all-loops --param max-completely-peeled-insns=1000 --param max-completely-peel-times=10000

并部分循环展开,它展开循环10次,然后做一个条件跳转。

does partially loop unrolling, it unrolls the loop ten times and then does a conditional jump.

.LC0:
        .string "%d\n"
main:
        pushq   %rbx
        xorl    %ebx, %ebx
.L2:
        movl    %ebx, %esi
        movl    $.LC0, %edi
        xorl    %eax, %eax
        call    printf
        leal    1(%rbx), %esi
        movl    $.LC0, %edi
        xorl    %eax, %eax
        call    printf
        leal    2(%rbx), %esi
        movl    $.LC0, %edi
        xorl    %eax, %eax
        call    printf
        leal    3(%rbx), %esi
        movl    $.LC0, %edi
        xorl    %eax, %eax
        call    printf
        leal    4(%rbx), %esi
        movl    $.LC0, %edi
        xorl    %eax, %eax
        call    printf
        leal    5(%rbx), %esi
        movl    $.LC0, %edi
        xorl    %eax, %eax
        call    printf
        leal    6(%rbx), %esi
        movl    $.LC0, %edi
        xorl    %eax, %eax
        call    printf
        leal    7(%rbx), %esi
        movl    $.LC0, %edi
        xorl    %eax, %eax
        call    printf
        leal    8(%rbx), %esi
        movl    $.LC0, %edi
        xorl    %eax, %eax
        call    printf
        leal    9(%rbx), %esi
        xorl    %eax, %eax
        movl    $.LC0, %edi
        addl    $10, %ebx
        call    printf
        cmpl    $20, %ebx
        jne     .L2
        xorl    %eax, %eax
        popq    %rbx
        ret

不过,使用旧版本的GCC,如4.9.2将创建所需的assemlby

But using older versions of GCC such as 4.9.2 creates the desired assemlby

.LC0:
    .string "%d\n"
main:
    subq    $8, %rsp
    xorl    %edx, %edx
    movl    $.LC0, %esi
    movl    $1, %edi
    xorl    %eax, %eax
    call    __printf_chk
    movl    $1, %edx
    movl    $.LC0, %esi
    movl    $1, %edi
    xorl    %eax, %eax
    call    __printf_chk
    movl    $2, %edx
    movl    $.LC0, %esi
    movl    $1, %edi
    xorl    %eax, %eax
    call    __printf_chk
    movl    $3, %edx
    movl    $.LC0, %esi
    movl    $1, %edi
    xorl    %eax, %eax
    call    __printf_chk
    movl    $4, %edx
    movl    $.LC0, %esi
    movl    $1, %edi
    xorl    %eax, %eax
    call    __printf_chk
    movl    $5, %edx
    movl    $.LC0, %esi
    movl    $1, %edi
    xorl    %eax, %eax
    call    __printf_chk
    movl    $6, %edx
    movl    $.LC0, %esi
    movl    $1, %edi
    xorl    %eax, %eax
    call    __printf_chk
    movl    $7, %edx
    movl    $.LC0, %esi
    movl    $1, %edi
    xorl    %eax, %eax
    call    __printf_chk
    movl    $8, %edx
    movl    $.LC0, %esi
    movl    $1, %edi
    xorl    %eax, %eax
    call    __printf_chk
    movl    $9, %edx
    movl    $.LC0, %esi
    movl    $1, %edi
    xorl    %eax, %eax
    call    __printf_chk
    movl    $10, %edx
    movl    $.LC0, %esi
    movl    $1, %edi
    xorl    %eax, %eax
    call    __printf_chk
    movl    $11, %edx
    movl    $.LC0, %esi
    movl    $1, %edi
    xorl    %eax, %eax
    call    __printf_chk
    movl    $12, %edx
    movl    $.LC0, %esi
    movl    $1, %edi
    xorl    %eax, %eax
    call    __printf_chk
    movl    $13, %edx
    movl    $.LC0, %esi
    movl    $1, %edi
    xorl    %eax, %eax
    call    __printf_chk
    movl    $14, %edx
    movl    $.LC0, %esi
    movl    $1, %edi
    xorl    %eax, %eax
    call    __printf_chk
    movl    $15, %edx
    movl    $.LC0, %esi
    movl    $1, %edi
    xorl    %eax, %eax
    call    __printf_chk
    movl    $16, %edx
    movl    $.LC0, %esi
    movl    $1, %edi
    xorl    %eax, %eax
    call    __printf_chk
    movl    $17, %edx
    movl    $.LC0, %esi
    movl    $1, %edi
    xorl    %eax, %eax
    call    __printf_chk
    movl    $18, %edx
    movl    $.LC0, %esi
    movl    $1, %edi
    xorl    %eax, %eax
    call    __printf_chk
    movl    $19, %edx
    movl    $.LC0, %esi
    movl    $1, %edi
    xorl    %eax, %eax
    call    __printf_chk
    xorl    %eax, %eax
    addq    $8, %rsp
    ret

这有什么办法,迫使海湾合作委员会的后续版本,以产生相同的输出?

It there a way to force the later versions of GCC to produce the same output?

使用 https://godbolt.org/g/D1AR6i 生产组装

编辑:否重复的问题,因为与以后版本的GCC完全地展开循环的问题还没得到解决。通过 - 参数MAX-完全去皮-的insn = 1000 --param MAX-完全剥离倍= 10000 对生成的程序集使用GCC> = 5.1不影响

No duplicated question, since the problem to completly unroll loops with later versions of GCC has not yet been solved. Passing --param max-completely-peeled-insns=1000 --param max-completely-peel-times=10000 has not effects on the generated assembly using GCC >= 5.1

推荐答案

您所使用的标志和参数的不保证的循环将完全展开。 rel=\"nofollow\"> GCC文档国 -funroll-全循环的你正在使用标志打开完整的循环剥离(带小恒的迭代次数,即彻底清除循环)。如果编译器决定迭代对于给定的code的数过大,则可能仅做部分剥离或展开,因为它已经在这里完成。此外,的参数的您正在使用的选项只有最大值,然后不强制完成展开for循环低于设定值小;但如果环比最大迭代次数越多,你已经设置,循环将不被完全展开。

The flags and parameters you are using do not guarantee the loops will be completely unrolled. The GCC documentation states that the -funroll-all-loops flag you are using "turns on complete loop peeling (i.e. complete removal of loops with a small constant number of iterations)". If the compiler decides that the number of iterations for a given code is too big, it may only do partial peeling or unrolling as it has done here. Furthermore, the param options you are using are only maximum values, and do not force complete unrolling for loops smaller than the set value; but if a loop has more iterations than the maximum you have set, the loop will not to be completely unrolled.

许多因素在做最佳化的时候考虑。在这里,你的code中的瓶颈是调用的的printf 的函数,编译器做它的成本计算时,可能会莫名其妙地考虑到这一点,或法官,对展开的大小开销实在太重要了。正如你不过是告诉它解开循环,似乎以确定最好的解决办法是把10解开最初的循环和跳转。

Many factors are taken into account when doing optimisations. Here the bottleneck in your code is the call to printf function, and the compiler will probably somehow take this into account when doing its cost calculations, or judge that the size overhead for unrolling is too important. As you are nevertheless telling it to unroll loops, it seems to determine that the best solution is to transform the initial loop with 10 unrolls and a a jump.

如果您通过不同的东西代替的的printf 的,编译器可能的优化不同即可。例如尝试通过下列替换它

If you replace printf by something different, the compiler may optimise differently. For instance try replacing it by the following:

volatile int temp = k;

在previous code段循环将完全展开在GCC的新版本(与旧的为好)。需要注意的是volatile关键字就是用这样的编译器并不能完全消灭循环的把戏。

The loop with the previous code snippet will be fully unrolled on the newer versions of GCC (and the older ones as well). Note that the volatile keyword is just a trick used so the compiler does not wipe out the loop completely.

这篇关于GCC 5.1循环展开的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆