自动矢量化不工作 [英] Auto vectorization not working

查看:208
本文介绍了自动矢量化不工作的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图让我的代码自动矢量化,但它不工作。

  int _tmain argc,_TCHAR * argv [])
{
const int N = 4096;
float x [N];
float y [N];
float sum = 0;

//为x和y创建随机值
for(int i = 0; i {
x [i] = rand )> 1;
y [i] = rand()>> 1;
}

for(int i = 0; i sum + = x [i] * y [i]
}
}



我正在使用visual studio express 2013,并使用 / O2 编译 / Qvec-report:2 (要报告循环是否被矢量化)选项。当我编译时,我得到以下消息:

  ---分析功能:main 
c:\users\\ \\ ... \documents\visual studio 2013\projects\intrin3\intrin3\intrin3.cpp(28):info C5002:循环由于原因'1200'不向量化
c:\users \ ... \documents\visual studio 2013\projects\intrin3\intrin3\intrin3.cpp(41):info C5002:循环未向量化由于原因'1305'

原因'1305',可以看到这里,说编译器不能辨别这个循环的正确的可矢量化类型信息。我不确定这是什么意思。任何想法?



将第二个循环分成两个循环:

  for(int i = 0; i  sumarray [i] = x [i] * y [i] 
}

for(int i = 0; i sum + = sumarray [i]
}

现在第一个上面的循环向量化,错误1305发生,因为优化程序没有向量化循环,因为值<$ c $ <$>

c> sum 未使用。只需添加 printf(%d \\\
,sum)
即可修复。但是,然后你会得到一个新的错误代码1105循环包括一个不可识别的减少操作。要解决此问题,您需要设置 / fp:快速



原因是浮点运算不是关联的,使用SIMD或MIMD(即使用多线程)的减少需要是关联的。通过使用更宽松的浮点模型,你可以做减少。



我刚刚测试它与下面的代码和默认 fp: code>不向量化,当我使用 fp:fast 时它会。

 code> #include< stdio.h> 
int main(){
const int N = 4096;
float x [N];
float y [N];
float sum = 0;
for(int i = 0; i sum + = x [i] * y [i]
}
printf(sum%f \\\
,sum);
}

rand()函数 rand()函数不是SIMD函数。它不能向量化。你需要找到一个SIMD rand()函数。我不知道一个。另一种方法是预先计算一个随机数的数组,然后使用数组。在任何情况下, rand()是一个可怕的随机数生成,只适用于一些玩具的情况。考虑使用Mersenne捻线机PRNG。


I'm trying to get my code to auto vectorize, but it isn't working.

int _tmain(int argc, _TCHAR* argv[])
{
    const int N = 4096;
    float x[N];
    float y[N];
    float sum = 0;

    //create random values for x and y 
    for (int i = 0; i < N; i++)
    {
        x[i] = rand() >> 1;
        y[i] = rand() >> 1;
    }

    for (int i = 0; i < N; i++){
        sum += x[i] * y[i];
    }
}

Neither loop vectorizes here, but I'm really only interested in the second loop.

I'm using visual studio express 2013 and am compiling with the /O2 and /Qvec-report:2(To report whether or not the loop was vectorized) options. When I compile, I get the following message:

--- Analyzing function: main
c:\users\...\documents\visual studio 2013\projects\intrin3\intrin3\intrin3.cpp(28) : info C5002: loop not vectorized due to reason '1200'
c:\users\...\documents\visual studio 2013\projects\intrin3\intrin3\intrin3.cpp(41) : info C5002: loop not vectorized due to reason '1305'

Reason '1305', as can be seen HERE, says that "the compiler can't discern proper vectorizable type information for this loop." I'm not really sure what this means. Any ideas?

After splitting the second loop into two loops:

for (int i = 0; i < N; i++){
    sumarray[i] = x[i] * y[i];
}

for (int i = 0; i < N; i++){
    sum += sumarray[i];
}

Now the first of the above loops vectorizes, but the second one does not, again with error code 1305.

解决方案

The error 1305 happens because the optimizer did not vectorize the loop since the value sum is not used. Simply adding printf("%d\n", sum) fixes that. But then you get a new error code 1105 "Loop includes a non-recognized reduction operation". To fix this you need you need to set /fp:fast

The reason is that floating point arithmetic is not associative and reductions using SIMD or MIMD (i.e. using multiple threads) need to be associative. By using a looser floating point model you can do the reduction.

I just tested it with the following code and the default fp:precise does not vectorize and when I use fp:fast it does.

#include <stdio.h>
int main() {
    const int N = 4096;
    float x[N];
    float y[N];
    float sum = 0;
    for (int i = 0; i < N; i++){
        sum += x[i] * y[i];
    }
    printf("sum %f\n", sum);
}

In regards to your question about the loop with the rand() function the rand() function is not a SIMD function. It can't be vectorized. You need to find a SIMD rand() function. I don't know of one. An alternative is pre-compute an array of random numbers and use the array instead. In any case rand() is a horrible random number generate and is only useful for some toy cases. Consider using the Mersenne twister PRNG.

这篇关于自动矢量化不工作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆