如何优化GCC了一个循环内增加一个未使用的变量? [英] How does GCC optimize out an unused variable incremented inside a loop?

查看:217
本文介绍了如何优化GCC了一个循环内增加一个未使用的变量?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我写了这个简单的C程序:

INT的main(){
    INT I;
    诠释计数= 0;
    对于(i = 0; I< 20亿;我++){
        数=计+ 1;
    }
}

我想看看gcc编译器如何优化这个循环(明确添加的 1 的20亿倍应该是增加的 20亿的一次)。所以:

海合会test.c以,然后时间的a.out 给出:

 真正0m7.717s
用户0m7.710s
SYS 0m0.000s

$ GCC -O2 test.c的,然后时间 a.out`给出了:

 真正0m0.003s
用户0m0.000s
SYS 0m0.000s

然后我用 GCC -S 拆卸两者。第一个似乎很清楚:

  .filetest.c的
    。文本
.globl主
    .TYPE为主,@function
主要:
.LFB0:
    .cfi_startproc
    pushq%RBP
    .cfi_def_cfa_offset 16
    MOVQ%RSP,RBP%
    .cfi_offset 6,-16
    .cfi_def_cfa_register 6
    MOVL $ 0 -8(%RBP)
    MOVL $ 0 -4(RBP%)
    JMP .L2
.L3:
    ADDL $ 1,-8(%RBP)
    ADDL $ 1,-4(RBP%)
.L2:
    CMPL $一十九亿九千九百九十九万九千九百九十九,-4(RBP%)
    JLE .L3
    离开
    .cfi_def_cfa 7,8
    RET
    .cfi_endproc
.LFE0:
    .size为主,。,主
    .identGCC:(Ubuntu的/ Linaro的4.5.2-8ubuntu4)4.5.2
    .section伪.note.GNU堆栈,,@ PROGBITS

L3增加,L2比较 -4(RBP%) 1999999999 ,如果<$ C循环,以L3 $ C> I&LT; 20亿。

立即优化之一:

  .filetest.c的
    。文本
    .p2align 4日,15
.globl主
    .TYPE为主,@function
主要:
.LFB0:
    .cfi_startproc
    代表
    RET
    .cfi_endproc
.LFE0:
    .size为主,。,主
    .identGCC:(Ubuntu的/ Linaro的4.5.2-8ubuntu4)4.5.2
    .section伪.note.GNU堆栈,,@ PROGBITS

我不能都明白是怎么回事那里!我有装配一知半解,但是我预计像

  ADDL $ 2000000000,-8(%RBP)

我甚至试过的gcc -c -g -Wa,-a,-ad -O2 test.c以一起看C code。与它被转换到组装,但结果还是没有更明显,previous之一。

有人能简单介绍一下:


  1. GCC -S -O2 输出。

  2. 如果如我所料的循环优化(的一笔,而不是许多资金)?


解决方案

编译器是更聪明。 :)

在事实上,它意识到你没有使用循环的结果。因此,它拿出了整个循环彻底!

这是所谓的死code消除

一个更好的测试是打印结果:

 的#include&LT;&stdio.h中GT;
诠释主要(无效){
    INT I;诠释计数= 0;
    对于(i = 0; I&LT; 20亿;我++){
        数=计+ 1;
    }    //打印结果prevent死code消除
    的printf(%d个\\ N算);
}

编辑:我已经添加了必需的的#include&LT;&stdio.h中GT; ; MSVC的组装上市相当于没有版本的的#include ,但它应该是相同的。


我没有GCC在此刻我的面前,因为我引导到Windows。但这里有对MSVC与的printf()版本拆卸:

编辑:我有汇编输出错误。下面是正确的。

 ; 57:INT主要(){$ LN8:
    子RSP,40; 00000028H; 58:
; 59:
; 60:INT I;诠释计数= 0;
; 61:对于(i = 0; I&LT; 20亿;我++){
; 62:数=计+ 1;
; 63:}
; 64:
; 65://打印结果prevent死code消除
; 66:输出(%d个\\ N算);    LEA RCX,OFFSET FLAT:?? _ 3. C @ _03PMGGPEJJ @ $ CFD 6 $ @ AA??
    MOV EDX,20亿; 77359400H
    调用QWORD PTR __imp_printf; 67:
; 68:
; 69:
; 70:
; 71:返回0;    XOR EAX,EAX; 72:}    加RSP,40; 00000028H
    RET 0

所以,是的,Visual Studio中做这种优化。我认为可能GCC确实太少。

是的,海湾合作委员会执行类似的优化。下面是一个组装清单相同的程序与 GCC -S -O2 test.c以(GCC 4.5.2,Ubuntu的11.10,86):

  .filetest.c的
        .section伪.rodata.str1.1,AMS,@ PROGBITS,1
.LC0:
        .string%d个\\ N
        。文本
        .p2align 4日,15
.globl主
        .TYPE为主,@function
主要:
        pushl%EBP
        MOVL%ESP,EBP%
        和L $ -16,ESP%
        subl $ 16%ESP
        MOVL $ 2000000000,8(%ESP)
        MOVL $ .LC0,4(%尤)
        MOVL $ 1,(%ESP)
        调用__printf_chk
        离开
        RET
        .size为主,。,主
        .identGCC:(Ubuntu的/ Linaro的4.5.2-8ubuntu4)4.5.2
        .section伪.note.GNU堆栈,,@ PROGBITS

I wrote this simple C program:

int main() {
    int i;
    int count = 0;
    for(i = 0; i < 2000000000; i++){
        count = count + 1;
    }
}

I wanted to see how the gcc compiler optimizes this loop (clearly add 1 2000000000 times should be "add 2000000000 one time"). So:

gcc test.c and then time on a.out gives:

real 0m7.717s  
user 0m7.710s  
sys 0m0.000s  

$ gcc -O2 test.c and then time ona.out` gives:

real 0m0.003s  
user 0m0.000s  
sys 0m0.000s  

Then I disassembled both with gcc -S. First one seems quite clear:

    .file "test.c"  
    .text  
.globl main
    .type   main, @function  
main:
.LFB0:
    .cfi_startproc
    pushq   %rbp
    .cfi_def_cfa_offset 16
    movq    %rsp, %rbp
    .cfi_offset 6, -16
    .cfi_def_cfa_register 6
    movl    $0, -8(%rbp)
    movl    $0, -4(%rbp)
    jmp .L2
.L3:
    addl    $1, -8(%rbp)
    addl    $1, -4(%rbp)
.L2:
    cmpl    $1999999999, -4(%rbp)
    jle .L3
    leave
    .cfi_def_cfa 7, 8
    ret
    .cfi_endproc
.LFE0:
    .size   main, .-main
    .ident  "GCC: (Ubuntu/Linaro 4.5.2-8ubuntu4) 4.5.2"
    .section    .note.GNU-stack,"",@progbits

L3 adds, L2 compare -4(%rbp) with 1999999999 and loops to L3 if i < 2000000000.

Now the optimized one:

    .file "test.c"  
    .text
    .p2align 4,,15
.globl main
    .type main, @function
main:
.LFB0:
    .cfi_startproc
    rep
    ret
    .cfi_endproc
.LFE0:
    .size main, .-main
    .ident "GCC: (Ubuntu/Linaro 4.5.2-8ubuntu4) 4.5.2"
    .section .note.GNU-stack,"",@progbits

I can't understand at all what's going on there! I've got little knowledge of assembly, but I expected something like

addl $2000000000, -8(%rbp)

I even tried with gcc -c -g -Wa,-a,-ad -O2 test.c to see the C code together with the assembly it was converted to, but the result was no more clear that the previous one.

Can someone briefly explain:

  1. The gcc -S -O2 output.
  2. If the loop is optimized as I expected (one sum instead of many sums)?

解决方案

The compiler is even smarter than that. :)

In fact, it realizes that you aren't using the result of the loop. So it took out the entire loop completely!

This is called Dead Code Elimination.

A better test is to print the result:

#include <stdio.h>
int main(void) {
    int i; int count = 0;
    for(i = 0; i < 2000000000; i++){
        count = count + 1;
    }

    //  Print result to prevent Dead Code Elimination
    printf("%d\n", count);
}

EDIT : I've added the required #include <stdio.h>; the MSVC assembly listing corresponds to a version without the #include, but it should be the same.


I don't have GCC in front of me at the moment, since I'm booted into Windows. But here's the disassembly of the version with the printf() on MSVC:

EDIT : I had the wrong assembly output. Here's the correct one.

; 57   : int main(){

$LN8:
    sub rsp, 40                 ; 00000028H

; 58   : 
; 59   : 
; 60   :     int i; int count = 0;
; 61   :     for(i = 0; i < 2000000000; i++){
; 62   :         count = count + 1;
; 63   :     }
; 64   : 
; 65   :     //  Print result to prevent Dead Code Elimination
; 66   :     printf("%d\n",count);

    lea rcx, OFFSET FLAT:??_C@_03PMGGPEJJ@?$CFd?6?$AA@
    mov edx, 2000000000             ; 77359400H
    call    QWORD PTR __imp_printf

; 67   : 
; 68   : 
; 69   : 
; 70   :
; 71   :     return 0;

    xor eax, eax

; 72   : }

    add rsp, 40                 ; 00000028H
    ret 0

So yes, Visual Studio does this optimization. I'd assume GCC probably does too.

And yes, GCC performs a similar optimization. Here's an assembly listing for the same program with gcc -S -O2 test.c (gcc 4.5.2, Ubuntu 11.10, x86):

        .file   "test.c"
        .section        .rodata.str1.1,"aMS",@progbits,1
.LC0:
        .string "%d\n"
        .text
        .p2align 4,,15
.globl main
        .type   main, @function
main:
        pushl   %ebp
        movl    %esp, %ebp
        andl    $-16, %esp
        subl    $16, %esp
        movl    $2000000000, 8(%esp)
        movl    $.LC0, 4(%esp)
        movl    $1, (%esp)
        call    __printf_chk
        leave
        ret
        .size   main, .-main
        .ident  "GCC: (Ubuntu/Linaro 4.5.2-8ubuntu4) 4.5.2"
        .section        .note.GNU-stack,"",@progbits

这篇关于如何优化GCC了一个循环内增加一个未使用的变量?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆