了解gcc 4.9.2自动矢量化输出 [英] Understanding gcc 4.9.2 auto-vectorization output

查看:716
本文介绍了了解gcc 4.9.2自动矢量化输出的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在学习gcc自动矢量化模块。阅读此处的文档。



这是我试过的(debian jessie amd64):

  $ cat ex1.c 
int a [256],b [256],c [256];
foo(){
int i; (i = 0; i <256; i ++){
a [i] = b [i] + c [i]的

;






$ b然后,我只需运行:

  $ gcc -xc -Ofast -msse2 -c -ftree -vectorize -fopt -info-vec-missed ex1.c 
ex1 .c:5:3:注:misalign = ref字节b [i_11]
ex1.c:5:3:注释:misalign = 0字节ref [i_11]
ex1.c :5:3:note:misalign = 0字节的ref a [i_11]
ex1.c:5:3:注意:虚拟phi。跳跃。
ex1.c:5:3:note:num。 args = 4(不是一元/二元/三元操作)。
ex1.c:5:3:注意:不是ssa-name。
ex1.c:5:3:注意:使用不简单。
ex1.c:5:3:note:num。 args = 4(不是一元/二元/三元操作)。
ex1.c:5:3:注意:不是ssa-name。
ex1.c:5:3:注意:使用不简单。
ex1.c:2:1:注释:未矢量化:基本块中没有足够的数据引用。
ex1.c:6:13:note:not vectorized:no vectype for stmt:vect__4.5_1 = MEM [(int *)vectp_b.3_9];
scalar_type:vector(4)int
ex1.c:6:13:注意:未矢量化:基本块中没有足够的数据引用。
ex1.c:2:1:注释:未矢量化:基本块中没有足够的数据引用。
ex1.c:8:1:注释:未矢量化:基本块中没有足够的数据引用。

根据 documentation ,我会假设看到一条清晰的线条,如下所示:

  ex1.c:5:note:LOOP VECTORIZED。 

但事实并非如此。我已经使用了命令行选项: -fopt-info-vec-missed ,因为命令行选项: -ftree-vectorizer-verbose 现在无法使用,如每份报告



所以我的问题是:我如何阅读上面的输出来提取,以某种方式实际上向量化了循环?



如果有帮助:

  $ gcc -dumpversion 
4.9.2


解决方案

实际上在gcc在线文档中挖掘,我终于发现应该使用 -fopt-info-vec-optimized (或者 -fopt-info-vec-all )。请参阅此处和< a href =https://gcc.gnu.org/onlinedocs/gcc/Debugging-Options.html =nofollow noreferrer>这里:

< blockquote>

优化
优化成功应用时打印信息。决定哪些信息是相关的需要通过。例如,矢量化器通过打印成功向量化的循环的源位置。



错过
打印关于错过优化的信息。个人通行证控制哪些信息包含在输出中。


$ b

note
打印有关优化的详细信息,例如某些转换,有关决策的更详细信息等等。


全部
打印详细的优化信息。这包括'优化','错过'和'注意'。



I am trying to learn gcc auto-vectorization module. After reading documentation from here.

Here is what I tried (debian jessie amd64):

$ cat ex1.c
int a[256], b[256], c[256];
foo () {
  int i;

  for (i=0; i<256; i++){
    a[i] = b[i] + c[i];
  }
}

And then, I simply run:

$ gcc  -x c -Ofast -msse2 -c   -ftree-vectorize -fopt-info-vec-missed ex1.c
ex1.c:5:3: note: misalign = 0 bytes of ref b[i_11]
ex1.c:5:3: note: misalign = 0 bytes of ref c[i_11]
ex1.c:5:3: note: misalign = 0 bytes of ref a[i_11]
ex1.c:5:3: note: virtual phi. skip.
ex1.c:5:3: note: num. args = 4 (not unary/binary/ternary op).
ex1.c:5:3: note: not ssa-name.
ex1.c:5:3: note: use not simple.
ex1.c:5:3: note: num. args = 4 (not unary/binary/ternary op).
ex1.c:5:3: note: not ssa-name.
ex1.c:5:3: note: use not simple.
ex1.c:2:1: note: not vectorized: not enough data-refs in basic block.
ex1.c:6:13: note: not vectorized: no vectype for stmt: vect__4.5_1 = MEM[(int *)vectp_b.3_9];
 scalar_type: vector(4) int
ex1.c:6:13: note: not vectorized: not enough data-refs in basic block.
ex1.c:2:1: note: not vectorized: not enough data-refs in basic block.
ex1.c:8:1: note: not vectorized: not enough data-refs in basic block.

As per the documentation, I would have assumed to see a clear line saying something like:

ex1.c:5: note: LOOP VECTORIZED.

but that is not the case. I have used the command line option: -fopt-info-vec-missed since the command line option: -ftree-vectorizer-verbose is now a no-op, as per report.

So my question: is how do I read the above output to extract that somehow the loop was actually vectorized ?

In case that help:

$ gcc -dumpversion
4.9.2

解决方案

Actually digging in gcc online doc, I finally found out that I should use instead: -fopt-info-vec-optimized (or maybe -fopt-info-vec-all). See here and here:

optimized: Print information when an optimization is successfully applied. It is up to a pass to decide which information is relevant. For example, the vectorizer passes print the source location of loops which are successfully vectorized.

missed: Print information about missed optimizations. Individual passes control which information to include in the output.

note: Print verbose information about optimizations, such as certain transformations, more detailed messages about decisions etc.

all: Print detailed optimization information. This includes ‘optimized’, ‘missed’, and ‘note’.

这篇关于了解gcc 4.9.2自动矢量化输出的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆