为什么矢量化失败? [英] Why does vectorization fail?

查看:308
本文介绍了为什么矢量化失败?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用

-msse2 -ftree-vectorizer-verbose=2.

我有以下简单代码:

int main(){
  int a[2048], b[2048], c[2048];
  int i;

  for (i=0; i<2048; i++){
      b[i]=0;
      c[i]=0;
  }

  for (i=0; i<2048; i++){
    a[i] = b[i] + c[i];
  }
  return 0;
}

为什么我会收到

 test.cpp:10: note: not vectorized: not enough data-refs in basic block.


$ b

谢谢!

Thanks!

推荐答案

在添加一个 asm volatile(:+ m(a),+ m(b),+ m(c): :memory); 接近 main 结尾处,我的 gcc 这:

For what it's worth, after adding an asm volatile("": "+m"(a), "+m"(b), "+m"(c)::"memory"); near the end of main, my copy of gcc emits this:

400610:       48 81 ec 08 60 00 00    sub    $0x6008,%rsp
400617:       ba 00 20 00 00          mov    $0x2000,%edx
40061c:       31 f6                   xor    %esi,%esi
40061e:       48 8d bc 24 00 20 00    lea    0x2000(%rsp),%rdi
400625:       00
400626:       e8 b5 ff ff ff          callq  4005e0 <memset@plt>
40062b:       ba 00 20 00 00          mov    $0x2000,%edx
400630:       31 f6                   xor    %esi,%esi
400632:       48 8d bc 24 00 40 00    lea    0x4000(%rsp),%rdi
400639:       00
40063a:       e8 a1 ff ff ff          callq  4005e0 <memset@plt>
40063f:       31 c0                   xor    %eax,%eax
400641:       0f 1f 80 00 00 00 00    nopl   0x0(%rax)
400648:       c5 f9 6f 84 04 00 20    vmovdqa 0x2000(%rsp,%rax,1),%xmm0
40064f:       00 00
400651:       c5 f9 fe 84 04 00 40    vpaddd 0x4000(%rsp,%rax,1),%xmm0,%xmm0
400658:       00 00
40065a:       c5 f8 29 04 04          vmovaps %xmm0,(%rsp,%rax,1)
40065f:       48 83 c0 10             add    $0x10,%rax
400663:       48 3d 00 20 00 00       cmp    $0x2000,%rax
400669:       75 dd                   jne    400648 <main+0x38>

所以它认为第一个循环只是做 memset 到几个数组,第二个循环正在做一个向量加法,它适当地矢量化。

So it recognised that the first loop was just doing memset to a couple arrays and the second loop was doing a vector addition, which it appropriately vectorised.

我使用 gcc version 4.9 .0 20140521(prerelease)(GCC)

使用 gcc 4.7.2版(Debian 4.7 .2-5)也用矢量化的循环,但以不同的方式。您的 -ftree-vectorizer-verbose = 2 设置使其产生以下输出:

An older machine with gcc version 4.7.2 (Debian 4.7.2-5) also vectorises the loop, but in a different way. Your -ftree-vectorizer-verbose=2 setting makes it emit the following output:

Analyzing loop at foo155.cc:10


Vectorizing loop at foo155.cc:10

10: LOOP VECTORIZED.
foo155.cc:1: note: vectorized 1 loops in function.

你可能会忘记你的编译器标志(我使用 g ++ -O3 -ftree- vectorize -ftree-vectorizer-verbose = 2 -march = native foo155.cc -o foo155 以构建)或有一个真正的旧编译器。

You probably goofed your compiler flags (I used g++ -O3 -ftree-vectorize -ftree-vectorizer-verbose=2 -march=native foo155.cc -o foo155 to build) or have a really old compiler.

这篇关于为什么矢量化失败?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆