是否可以使用 SSE 对嵌套进行矢量化? [英] Is it possible to vectorize this nested for with SSE?

查看：37 发布时间：2021/8/27 19:47:11 c++ x86 vectorization sse simd

本文介绍了是否可以使用 SSE 对嵌套进行矢量化?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我从来没有为 SSE 优化编写过汇编代码，如果这是一个菜鸟问题，很抱歉.在 this 中解释了如何矢量化 for 带有条件语句.但是，我的代码(取自 here )的形式如下:

I've never written assembly code for SSE optimization, so sorry if this is a noob question. In this aritcle is explained how to vectorize a for with a conditional statement. However, my code (taken from here ) is of the form:

   for (int j=-halfHeight; j<=halfHeight; ++j)
   {
      for(int i=-halfWidth; i<=halfWidth; ++i)
      {
         const float rx = ofsx + j * a12;
         const float ry = ofsy + j * a22;
         float wx = rx + i * a11;
         float wy = ry + i * a21;
         const int x = (int) floor(wx);
         const int y = (int) floor(wy);
         if (x >= 0 && y >= 0 && x < width && y < height)
         {
            // compute weights
            wx -= x; wy -= y;
            // bilinear interpolation
            *out++ =
               (1.0f - wy) * ((1.0f - wx) * im.at<float>(y,x)   + wx * im.at<float>(y,x+1)) +
               (       wy) * ((1.0f - wx) * im.at<float>(y+1,x) + wx * im.at<float>(y+1,x+1));
         } else {
            *out++ = 0;
         }
      }
   }

所以，根据我的理解，链接的文章有几个不同之处:

So, from my understanding, there are several differences with the linked article:

这里我们有一个嵌套的for:我一直在vectroization中看到一层for，从未见过嵌套循环
if 条件基于标量值(x 和 y)而不是基于数组:我如何将链接的示例适应于此?
out 索引不是基于 i 或 j(所以它不是 out[i]或 out[j]):我如何用这种方式填写 out?

Here we have a nested for: I've always seen one level for in vectroization, never seen a nested loop
The if condition is based on scalar values (x and y) and not on the array: how can I adapt the linked example to this?
The out index isn't based on i or j (so it's not out[i] or out[j]): how can I fill out in this way?

特别是我很困惑，因为 for 索引总是用作数组索引，而这里用于计算变量，同时向量逐周期递增

In particular I'm confused because for indexes are always used as array indexes, while here are used to compute variables while the vector is incremented cycle by cycle

我将 icpc 与 -O3 -xCORE-AVX2 -qopt-report=5 和其他一些优化标志一起使用.根据英特尔顾问的说法，这不是矢量化的，并且使用 #pragma omp simd 会生成 warning #15552:loop was not vectorized with "simd"

I'm using icpc with -O3 -xCORE-AVX2 -qopt-report=5 and a bunch of others optimization flags. According to Intel Advisor, this is not vectorized, and using #pragma omp simd generates warning #15552: loop was not vectorized with "simd"

是否可以使用 SSE 对嵌套进行矢量化? [英] Is it possible to vectorize this nested for with SSE?

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录关闭

是否可以使用 SSE 对嵌套进行矢量化? [英] Is it possible to vectorize this nested for with SSE?

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录 关闭

登录关闭