编译指示 simd 和 ivdep 向量之间的差异总是? [英] Diferences between pragmas simd and ivdep vector always?

查看:27
本文介绍了编译指示 simd 和 ivdep 向量之间的差异总是?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在尝试对程序进行矢量化,但我观察到了一个奇怪的行为

I am currently trying to vectorize a program and i have observed an odd behaviour

似乎for循环在使用时被向量化

Seems that a for loop is vectorized when using

#pragma simd

#pragma simd

(262): (col. 3) 备注:SIMD 循环被矢量化.

(262): (col. 3) remark: SIMD LOOP WAS VECTORIZED.

但是当我使用时没有

#pragma vector 总是

#pragma vector always

#pragma ivdep

#pragma ivdep

(262): (col. 3) 备注:循环未向量化:存在向量依赖.

(262): (col. 3) remark: loop was not vectorized: existence of vector dependence.

我一直认为两个句子做相同的向量化

I always thought that both sentences do the same vectorization

推荐答案

#pragma simd 是提供给开发人员的显式矢量化工具,用于强制实施矢量化,如 https://software.intel.com/en-us/node/514582#pragma vector 是一种工具,用于指示编译器应该根据其参数对循环进行矢量化.这里的参数是 always,这意味着忽略编译器的成本/效率启发式并继续矢量化".有关 #pragma vector 的更多信息,请访问 https://software.intel.com/en-us/node/514586.这并不意味着 #pragma simd 会产生错误的结果,它成功地对 #pragma vector always 无法矢量化的循环进行矢量化.当 #pragma simd 与正确的子句集一起使用时,它可以向量化并仍然产生正确的结果.

#pragma simd is an explicit vectorization tool given to the developer to enforce vectorization as mentioned at https://software.intel.com/en-us/node/514582 while #pragma vector is a tool which is used to indicate the compiler that loop should be vectorized based on its argument(s). Here the argument is always, which means "neglect the cost/efficiency heuristics of the compiler and go ahead with vectorization". More information on #pragma vector is available at https://software.intel.com/en-us/node/514586. That doesn't mean #pragma simd produces wrong results it succeeds in vectorizing a loop where #pragma vector always failed to vectorize. When #pragma simd is used with right set of clauses, it can vectorize and still produce a correct result.

下面是一个小代码片段,用于演示:

Below is a small code snippet which demonstrates that:

void foo(float *a, float *b, float *c, int N) {#pragma 向量总是#pragma ivdep//#pragma simd vectorlength(2)for(int i = 2; i

void foo(float *a, float *b, float *c, int N) { #pragma vector always #pragma ivdep //#pragma simd vectorlength(2) for(int i = 2; i < N; i++) a[i] = a[i-2] + b[i] + c[i]; return; }

使用 ICC 编译此代码将生成以下矢量化报告:

Compiling this code using ICC will produce the following vectorization report:

$ icc -c -vec-report2 test11.cc
test11.cc(5): (col. 1) remark: loop was not vectorized: existence of vector dependence

默认情况下,ICC 以使用 128 位 XMM 寄存器的 SSE2 为目标.一个 XMM 寄存器中可以容纳 4 个浮点数,但是当您尝试容纳 4 个浮点数的向量时,存在向量依赖性.所以#pragma vector 总是发出的是正确的.但是如果我们只考虑 2 个浮点数而不是 4 个,我们可以向量化这个循环而不会破坏结果.其矢量化报告如下所示:

By default ICC targets SSE2 which uses 128 bits XMM registers. 4 floats can be accommodated in one XMM register but when you try to accommodate vector of 4 floats, there is a vector dependence. So what #pragma vector always emits is right. But instead of 4, if we consider just 2 floats, we can vectorize this loop without corrupting the results. The vectorization report for the same is shown below:

void foo(float *a, float *b, float *c, int N){
    //#pragma vector always
    //#pragma ivdep
    #pragma simd vectorlength(2)
    for(int i = 2; i < N; i++)
        a[i] = a[i-2] + b[i] + c[i];
    return;
}

$ icc -c -vec-report2 test11.cc
test11.cc(5): (col. 1) remark: SIMD LOOP WAS VECTORIZED

但是 #pragma vector 没有一个子句可以明确指定在对循环进行矢量化时要考虑的矢量长度.这是 #pragma simd 真正可以派上用场的地方.

But #pragma vector doesn't have a clause which can explicitly specify the vector length to consider while vectoring the loop. This is where #pragma simd can really come in handy.

当与以向量方式最好地解释计算的正确子句一起使用时,编译器将生成不会产生错误结果的请求向量.英特尔(R) Cilk(TM) Plus 白皮书发布于 https://software.intel.com/sites/default/files/article/402486/intel-cilk-plus-white-paper.pdf 有一个用法"部分$pragma simd vectorlength 子句"和$pragma simd 缩减和私有条款的使用";这解释了如何使用正确的子句 pragma simd 子句.这些子句帮助开发人员向编译器表达他想要实现的目标,编译器会相应地生成向量代码.是否强烈建议在需要最好地向编译器表达循环逻辑的地方使用带有相关子句的 #pragma simd.

When used with right clauses which best explains the computation in vector fashion, the compiler will generate the requested vector which will not generate wrong results. The Intel(R) Cilk(TM) Plus White Paper published at https://software.intel.com/sites/default/files/article/402486/intel-cilk-plus-white-paper.pdf has a section for "Usage of $pragma simd vectorlength clause" and "Usage of $pragma simd reduction and private clause" which explains how to pragma simd clause with right clauses. The clauses help the developer express to the compiler what he wants to achieve and the compiler generates the vector code accordingly. Is it highly recommended to use #pragma simd with relevant clauses wherever needed to best express the loop logic to the compiler.

传统上,内循环也用于矢量化,但 pragma simd 也可用于矢量化外循环.有关更多信息,请访问 https://software.intel.com/en-us/articles/outer-loop-vectorization.

Also traditionally inner loops are targeted for vectorization but pragma simd can be used for vectorizing outer loops too. More information on this available at https://software.intel.com/en-us/articles/outer-loop-vectorization.

这篇关于编译指示 simd 和 ivdep 向量之间的差异总是?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆