如何优化GCC了一个循环内增加一个未使用的变量？ [英] How does GCC optimize out an unused variable incremented inside a loop?

查看：217 发布时间：2016/8/18 12:54:18 c gcc compiler-optimization disassembly

本文介绍了如何优化GCC了一个循环内增加一个未使用的变量？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我写了这个简单的C程序：

 INT的main（）{
    INT I;
    诠释计数= 0;
    对于（i = 0; I＆LT; 20亿;我++）{
        数=计+ 1;
    }
}

我想看看gcc编译器如何优化这个循环（明确添加的 1 的20亿倍应该是增加的 20亿的一次）。所以：

海合会test.c以，然后时间在的a.out 给出：

 真正0m7.717s
用户0m7.710s
SYS 0m0.000s

$ GCC -O2 test.c的，然后时间 a.out`给出了：

 真正0m0.003s
用户0m0.000s
SYS 0m0.000s

然后我用 GCC -S 拆卸两者。第一个似乎很清楚：

  .filetest.c的
    。文本
.globl主
    .TYPE为主，@function
主要：
.LFB0：
    .cfi_startproc
    pushq％RBP
    .cfi_def_cfa_offset 16
    MOVQ％RSP，RBP％
    .cfi_offset 6，-16
    .cfi_def_cfa_register 6
    MOVL $ 0 -8（％RBP）
    MOVL $ 0 -4（RBP％）
    JMP .L2
.L3：
    ADDL $ 1，-8（％RBP）
    ADDL $ 1，-4（RBP％）
.L2：
    CMPL $一十九亿九千九百九十九万九千九百九十九，-4（RBP％）
    JLE .L3
    离开
    .cfi_def_cfa 7,8
    RET
    .cfi_endproc
.LFE0：
    .size为主，。，主
    .identGCC：（Ubuntu的/ Linaro的4.5.2-8ubuntu4）4.5.2
    .section伪.note.GNU堆栈，，@ PROGBITS

L3增加，L2比较 -4（RBP％）与 1999999999 ，如果<$ C循环，以L3 $ C> I＆LT; 20亿。

立即优化之一：

  .filetest.c的
    。文本
    .p2align 4日，15
.globl主
    .TYPE为主，@function
主要：
.LFB0：
    .cfi_startproc
    代表
    RET
    .cfi_endproc
.LFE0：
    .size为主，。，主
    .identGCC：（Ubuntu的/ Linaro的4.5.2-8ubuntu4）4.5.2
    .section伪.note.GNU堆栈，，@ PROGBITS

我不能都明白是怎么回事那里！我有装配一知半解，但是我预计像

  ADDL $ 2000000000，-8（％RBP）

我甚至试过的gcc -c -g -Wa，-a，-ad -O2 test.c以一起看C code。与它被转换到组装，但结果还是没有更明显，previous之一。

有人能简单介绍一下：

在 GCC -S -O2 输出。

如果如我所料的循环优化（的一笔，而不是许多资金）？

解决方案

编译器是更聪明。：）

在事实上，它意识到你没有使用循环的结果。因此，它拿出了整个循环彻底！

这是所谓的死code消除。

一个更好的测试是打印结果：

 的#include＆LT;＆stdio.h中GT;
诠释主要（无效）{
    INT I;诠释计数= 0;
    对于（i = 0; I＆LT; 20亿;我++）{
        数=计+ 1;
    }    //打印结果prevent死code消除
    的printf（％d个\\ N算）;
}

编辑：我已经添加了必需的的#include＆LT;＆stdio.h中GT; ; MSVC的组装上市相当于没有版本的的#include ，但它应该是相同的。

我没有GCC在此刻我的面前，因为我引导到Windows。但这里有对MSVC与的printf（）版本拆卸：

编辑：我有汇编输出错误。下面是正确的。

 ; 57：INT主要（）{$ LN8：
    子RSP，40; 00000028H; 58：
; 59：
; 60：INT I;诠释计数= 0;
; 61：对于（i = 0; I＆LT; 20亿;我++）{
; 62：数=计+ 1;
; 63：}
; 64：
; 65：//打印结果prevent死code消除
; 66：输出（％d个\\ N算）;    LEA RCX，OFFSET FLAT：?? _ 3. C @ _03PMGGPEJJ @ $ CFD 6 $ @ AA？？
    MOV EDX，20亿; 77359400H
    调用QWORD PTR __imp_printf; 67：
; 68：
; 69：
; 70：
; 71：返回0;    XOR EAX，EAX; 72：}    加RSP，40; 00000028H
    RET 0

所以，是的，Visual Studio中做这种优化。我认为可能GCC确实太少。

是的，海湾合作委员会执行类似的优化。下面是一个组装清单相同的程序与 GCC -S -O2 test.c以（GCC 4.5.2，Ubuntu的11.10，86）：

  .filetest.c的
        .section伪.rodata.str1.1，AMS，@ PROGBITS，1
.LC0：
        .string％d个\\ N
        。文本
        .p2align 4日，15
.globl主
        .TYPE为主，@function
主要：
        pushl％EBP
        MOVL％ESP，EBP％
        和L $ -16，ESP％
        subl $ 16％ESP
        MOVL $ 2000000000，8（％ESP）
        MOVL $ .LC0，4（％尤）
        MOVL $ 1，（％ESP）
        调用__printf_chk
        离开
        RET
        .size为主，。，主
        .identGCC：（Ubuntu的/ Linaro的4.5.2-8ubuntu4）4.5.2
        .section伪.note.GNU堆栈，，@ PROGBITS

I wrote this simple C program:

int main() {
    int i;
    int count = 0;
    for(i = 0; i < 2000000000; i++){
        count = count + 1;
    }
}

I wanted to see how the gcc compiler optimizes this loop (clearly add 1 2000000000 times should be "add 2000000000 one time"). So:

gcc test.c and then time on a.out gives:

real 0m7.717s  
user 0m7.710s  
sys 0m0.000s

$ gcc -O2 test.c and then time ona.out` gives:

real 0m0.003s  
user 0m0.000s  
sys 0m0.000s

Then I disassembled both with gcc -S. First one seems quite clear:

    .file "test.c"  
    .text  
.globl main
    .type   main, @function  
main:
.LFB0:
    .cfi_startproc
    pushq   %rbp
    .cfi_def_cfa_offset 16
    movq    %rsp, %rbp
    .cfi_offset 6, -16
    .cfi_def_cfa_register 6
    movl    $0, -8(%rbp)
    movl    $0, -4(%rbp)
    jmp .L2
.L3:
    addl    $1, -8(%rbp)
    addl    $1, -4(%rbp)
.L2:
    cmpl    $1999999999, -4(%rbp)
    jle .L3
    leave
    .cfi_def_cfa 7, 8
    ret
    .cfi_endproc
.LFE0:
    .size   main, .-main
    .ident  "GCC: (Ubuntu/Linaro 4.5.2-8ubuntu4) 4.5.2"
    .section    .note.GNU-stack,"",@progbits

L3 adds, L2 compare -4(%rbp) with 1999999999 and loops to L3 if i < 2000000000.

Now the optimized one:

    .file "test.c"  
    .text
    .p2align 4,,15
.globl main
    .type main, @function
main:
.LFB0:
    .cfi_startproc
    rep
    ret
    .cfi_endproc
.LFE0:
    .size main, .-main
    .ident "GCC: (Ubuntu/Linaro 4.5.2-8ubuntu4) 4.5.2"
    .section .note.GNU-stack,"",@progbits

I can't understand at all what's going on there! I've got little knowledge of assembly, but I expected something like

addl $2000000000, -8(%rbp)

I even tried with gcc -c -g -Wa,-a,-ad -O2 test.c to see the C code together with the assembly it was converted to, but the result was no more clear that the previous one.

Can someone briefly explain:

The gcc -S -O2 output.
If the loop is optimized as I expected (one sum instead of many sums)?

解决方案

The compiler is even smarter than that. :)

In fact, it realizes that you aren't using the result of the loop. So it took out the entire loop completely!

This is called Dead Code Elimination.

A better test is to print the result:

#include <stdio.h>
int main(void) {
    int i; int count = 0;
    for(i = 0; i < 2000000000; i++){
        count = count + 1;
    }

    //  Print result to prevent Dead Code Elimination
    printf("%d\n", count);
}

EDIT : I've added the required #include <stdio.h>; the MSVC assembly listing corresponds to a version without the #include, but it should be the same.

I don't have GCC in front of me at the moment, since I'm booted into Windows. But here's the disassembly of the version with the printf() on MSVC:

EDIT : I had the wrong assembly output. Here's the correct one.

; 57   : int main(){

$LN8:
    sub rsp, 40                 ; 00000028H

; 58   : 
; 59   : 
; 60   :     int i; int count = 0;
; 61   :     for(i = 0; i < 2000000000; i++){
; 62   :         count = count + 1;
; 63   :     }
; 64   : 
; 65   :     //  Print result to prevent Dead Code Elimination
; 66   :     printf("%d\n",count);

    lea rcx, OFFSET FLAT:??_C@_03PMGGPEJJ@?$CFd?6?$AA@
    mov edx, 2000000000             ; 77359400H
    call    QWORD PTR __imp_printf

; 67   : 
; 68   : 
; 69   : 
; 70   :
; 71   :     return 0;

    xor eax, eax

; 72   : }

    add rsp, 40                 ; 00000028H
    ret 0

So yes, Visual Studio does this optimization. I'd assume GCC probably does too.

And yes, GCC performs a similar optimization. Here's an assembly listing for the same program with gcc -S -O2 test.c (gcc 4.5.2, Ubuntu 11.10, x86):

        .file   "test.c"
        .section        .rodata.str1.1,"aMS",@progbits,1
.LC0:
        .string "%d\n"
        .text
        .p2align 4,,15
.globl main
        .type   main, @function
main:
        pushl   %ebp
        movl    %esp, %ebp
        andl    $-16, %esp
        subl    $16, %esp
        movl    $2000000000, 8(%esp)
        movl    $.LC0, 4(%esp)
        movl    $1, (%esp)
        call    __printf_chk
        leave
        ret
        .size   main, .-main
        .ident  "GCC: (Ubuntu/Linaro 4.5.2-8ubuntu4) 4.5.2"
        .section        .note.GNU-stack,"",@progbits

这篇关于如何优化GCC了一个循环内增加一个未使用的变量？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何优化GCC了一个循环内增加一个未使用的变量？ [英] How does GCC optimize out an unused variable incremented inside a loop?

问题描述

相关文章

C/C++最新文章

热门教程

热门工具

登录关闭

如何优化GCC了一个循环内增加一个未使用的变量？ [英] How does GCC optimize out an unused variable incremented inside a loop?

问题描述

相关文章

C/C++最新文章

热门教程

热门工具

登录 关闭

登录关闭