包含比较的循环的自动矢量化 [英] Auto-vectorization of loop containing comparisons

查看:43
本文介绍了包含比较的循环的自动矢量化的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 Visual C++ 2013 自动矢量化器使以下循环矢量化 (/arch:AVX2),但编译器拒绝并给出以下消息:

I'm trying to use the Visual C++ 2013 auto-vectorizer to make the following loop vectorized (/arch:AVX2) but the compiler refuses and gives the following message:

info C5002: loop not vectorized due to reason '1100'

这个原因代码表示

Loop contains control flow—for example, "if" or "?".

我尝试将比较和最终分配拆分为一个单独的循环,但是当存在 intrinsics 可用于对浮点值进行比较.

I have tried to split the comparisons and the final assignment into a separate loop but that seems inefficient when there are intrinsics available for performing comparisons on floating point values.

为什么编译器要将比较视为流程控制,我可以在实现中更改哪些内容以便编译器矢量化此函数?

Why should the compiler treat comparisons as flow control, and what can I change in the implementation so that the compiler will vectorize this function?

void triplets_positive(
    const std::uint64_t count,
    double * const a,
    double * const b,
    double * const c,
    std::uint64_t * const all_positive)
{
    for (std::uint64_t i = 0; i < count; ++i)
    {
        // These >= operations make the loop not vectorisable because
        // they introduce control flow.
        std::uint64_t a_pos = (a[i] >= 0.0);
        std::uint64_t b_pos = (b[i] >= 0.0);
        std::uint64_t c_pos = (c[i] >= 0.0);

        all_positive[i] = a_pos & b_pos & c_pos;
    }
}

推荐答案

不幸的是,这似乎是 Visual C++ 2013 编译器中的错误或限制.其他编译器使用 CMPPD 指令 (AVX/AVX2) 或 CMP*PD 说明 (SSE2).

Unfortunately, this appears to be either a bug or limitation in the Visual C++ 2013 compiler. Other compilers make use either of the CMPPD instruction (AVX/AVX2) or CMP*PD instructions (SSE2).

成功矢量化此循环的编译器包括:

Compilers that successfully vectorise this loop include:

  • Visual C++ 2017
  • Visual C++ 2015
  • Clang + LLVM(Apple LLVM 版本 8.1.0 (clang-802.0.42))

虽然理论上可以将比较写为按位运算,但这会适得其反,最好的选择是升级到另一个编译器.

While it's theoretically possible to write the comparison as bitwise operations, that's counterproductive and the best option is to upgrade to another compiler.

这篇关于包含比较的循环的自动矢量化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆