可编译器优化可变长度的循环吗? [英] Can Compiler Optimize Loop with Variable Length?

查看:112
本文介绍了可编译器优化可变长度的循环吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果循环的最后一个索引( a b 在下面的示例中)在编译时不知道?

Can the compiler optimize loops if the last index of the loops (a and b in the following example) are not known at compile time?

未优化:

int* arr = new int[a*b];
for (i = 0; i < a; ++i){
    for(j = 0; j < b; ++j){
        arr[i*b+j] *= 8;
    }
}

//delete arr after done.

更多优化:(假设a和b很大...)

More Optimized: (assuming a and b are large...)

int c = a*b;
int* arr = new int[c];
for (i = 0; i < c; ++i){
        arr[c] *= 8;
}

//delete arr after done.


推荐答案

如果将数组视为线性空间,gcc

If you treat the array as linear space, gcc (and presumably others) will optimise even without knowing the extents at compile time.

此代码:

void by8(int* arr, int a, int b)
{
  auto extent = a * b;
  for (int i = 0; i < extent; ++i)
  {
    arr[i] *= 8;
  }
}

编译循环是向量化的)

by8(int*, int, int):
        imull   %esi, %edx
        testl   %edx, %edx
        jle     .L23
        movq    %rdi, %rax
        andl    $31, %eax
        shrq    $2, %rax
        negq    %rax
        andl    $7, %eax
        cmpl    %edx, %eax
        cmova   %edx, %eax
        cmpl    $8, %edx
        jg      .L26
        movl    %edx, %eax
.L3:
        sall    $3, (%rdi)
        cmpl    $1, %eax
        je      .L15
        sall    $3, 4(%rdi)
        cmpl    $2, %eax
        je      .L16
        sall    $3, 8(%rdi)
        cmpl    $3, %eax
        je      .L17
        sall    $3, 12(%rdi)
        cmpl    $4, %eax
        je      .L18
        sall    $3, 16(%rdi)
        cmpl    $5, %eax
        je      .L19
        sall    $3, 20(%rdi)
        cmpl    $6, %eax
        je      .L20
        sall    $3, 24(%rdi)
        cmpl    $7, %eax
        je      .L21
        sall    $3, 28(%rdi)
        movl    $8, %ecx
.L5:
        cmpl    %eax, %edx
        je      .L27
.L4:
        leal    -1(%rdx), %r8d
        movl    %edx, %r9d
        movl    %eax, %r10d
        subl    %eax, %r9d
        subl    %eax, %r8d
        leal    -8(%r9), %esi
        shrl    $3, %esi
        addl    $1, %esi
        leal    0(,%rsi,8), %r11d
        cmpl    $6, %r8d
        jbe     .L7
        leaq    (%rdi,%r10,4), %r10
        xorl    %eax, %eax
        xorl    %r8d, %r8d
.L9:
        vmovdqa (%r10,%rax), %ymm0
        addl    $1, %r8d
        vpslld  $3, %ymm0, %ymm0
        vmovdqa %ymm0, (%r10,%rax)
        addq    $32, %rax
        cmpl    %r8d, %esi
        ja      .L9
        addl    %r11d, %ecx
        cmpl    %r11d, %r9d
        je      .L22
        vzeroupper
.L7:
        movslq  %ecx, %rax
        sall    $3, (%rdi,%rax,4)
        leal    1(%rcx), %eax
        cmpl    %eax, %edx
        jle     .L23
        cltq
        sall    $3, (%rdi,%rax,4)
        leal    2(%rcx), %eax
        cmpl    %eax, %edx
        jle     .L23
        cltq
        sall    $3, (%rdi,%rax,4)
        leal    3(%rcx), %eax
        cmpl    %eax, %edx
        jle     .L23
        cltq
        sall    $3, (%rdi,%rax,4)
        leal    4(%rcx), %eax
        cmpl    %eax, %edx
        jle     .L23
        cltq
        sall    $3, (%rdi,%rax,4)
        leal    5(%rcx), %eax
        cmpl    %eax, %edx
        jle     .L23
        cltq
        addl    $6, %ecx
        sall    $3, (%rdi,%rax,4)
        cmpl    %ecx, %edx
        jle     .L28
        movslq  %ecx, %rcx
        sall    $3, (%rdi,%rcx,4)
        ret
.L22:
        vzeroupper
.L23:
        ret
.L27:
        ret
.L26:
        testl   %eax, %eax
        jne     .L3
        xorl    %ecx, %ecx
        jmp     .L4
.L28:
        ret
.L21:
        movl    $7, %ecx
        jmp     .L5
.L15:
        movl    $1, %ecx
        jmp     .L5
.L16:
        movl    $2, %ecx
        jmp     .L5
.L17:
        movl    $3, %ecx
        jmp     .L5
.L18:
        movl    $4, %ecx
        jmp     .L5
.L19:
        movl    $5, %ecx
        jmp     .L5
.L20:
        movl    $6, %ecx
        jmp     .L5


$ b b

编译器:gcc 5.4,带有命令行选项:-std = c ++ 14 -O3 -march = native

compiler : gcc 5.4 with command line options: -std=c++14 -O3 -march=native

这篇关于可编译器优化可变长度的循环吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆