如何使用SSE2向量化距离计算 [英] How to vectorize a distance calculation using SSE2

查看:242
本文介绍了如何使用SSE2向量化距离计算的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

A和B是向量或长度N,其中N可以在20至200的范围内。
我想计算这些向量之间的距离的平方,
即d ^ 2 = || AB || ^ 2。



far我有:

  float * a = ...; 
float * b = ...;
float d2 = 0;

for(int k = 0; k {
float d = a [k] -b [k]
d2 + = d * d;
}

这似乎很好,除了我已经剖析我的代码,这是瓶颈(超过50%的时间花在这样做)。
我使用Visual Studio 2012,在Win 7上,使用这些优化选项: / O2 / Oi / Ot / Oy -
我的理解是,VS2012应该自动矢量化该循环(使用SSE2)。
然而,如果我在代码中插入 #pragma loop(no_vector),我没有得到一个显着的减速,所以我猜测循环没有被矢量化。编译器确认使用此消息:

 信息C5002:循环由于'1105'不向量化

我的问题是:


  1. 它可以修复这个代码,以便VS2012可以向量化它?

  2. 如果不是,尝试自己向量化代码是有意义的吗?


  3. 是否有一定的N值,那么向量化会对生产力造成影响?

  4. 您是否可以为我推荐一个网站来了解SSE2编码?什么是原因'1105'


解决方案

MSDN文档,1105错误代码表示编译器无法找出如何将代码缩减为向量化指令。对于浮点操作,它表明您需要指定/ fp:fast选项以启用任何浮点减少。


A and B are vectors or length N, where N could be in the range 20 to 200 say. I want to calculate the square of the distance between these vectors, i.e. d^2 = ||A-B||^2.

So far I have:

float* a = ...;
float* b = ...;
float d2 = 0;

for(int k = 0; k < N; ++k)
{
    float d = a[k] - b[k];
    d2 += d * d;
}

That seems to work fine, except that I have profiled my code and this is the bottleneck (more than 50% of time is spent just doing this). I am using Visual Studio 2012, on Win 7, with these optimization options: /O2 /Oi /Ot /Oy-. My understanding is that VS2012 should auto-vectorize that loop (using SSE2). However if I insert #pragma loop(no_vector) in the code I don't get a noticable slow down, so I guess the loop is not being vectorized. The compiler confirms that with this message:

  info C5002: loop not vectorized due to reason '1105'

My questions are:

  1. Is it possible to fix this code so that VS2012 can vectorize it?
  2. If not, would it make sense to try to vectorize the code myself?
  3. Can you recommend a web site for me to learn about SSE2 coding?
  4. Is there some value of N below which vectorization would be counter productive?
  5. What is reason '1105'?

解决方案

From the MSDN documentation, the 1105 error code means the compiler is not able to figure out how to reduce the code to vectorized instructions. For floating point operations it is indicated that you need to specify the /fp:fast option to enable any floating point reductions at all.

这篇关于如何使用SSE2向量化距离计算的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆