SIMD 代码在 Debug 中有效,但在 Release 中无效 [英] SIMD code works in Debug, but does not in Release

查看:68
本文介绍了SIMD 代码在 Debug 中有效,但在 Release 中无效的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

此代码在调试模式下工作,但由于发布模式下的断言而导致恐慌.

This code works in debug mode, but panics because of the assert in release mode.

use std::arch::x86_64::*;

fn main() {
    unsafe {
        let a = vec![2.0f32, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0];
        let b = -1.0f32;

        let ar = _mm256_loadu_ps(a.as_ptr());
        println!("ar: {:?}", ar);

        let br = _mm256_set1_ps(b);
        println!("br: {:?}", br);

        let mut abr = _mm256_setzero_ps();
        println!("abr: {:?}", abr);

        abr = _mm256_fmadd_ps(ar, br, abr);
        println!("abr: {:?}", abr);

        let mut ab = [0.0; 8];
        _mm256_storeu_ps(ab.as_mut_ptr(), abr);
        println!("ab: {:?}", ab);

        assert_eq!(ab[0], -2.0f32);
    }
}

(游乐场)

推荐答案

我确实可以确认这段代码导致断言在发布模式下跳闸:

I can indeed confirm that this code causes the assert to trip in release mode:

$ cargo run --release
    Finished release [optimized] target(s) in 0.00s
     Running `target/release/so53831502`
ar: __m256(2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0)
br: __m256(-1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0)
abr: __m256(0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0)
abr: __m256(-1.0, -1.0, -1.0, -1.0, 0.0, 0.0, 0.0, 0.0)
ab: [-1.0, -1.0, -1.0, -1.0, 0.0, 0.0, 0.0, 0.0]
thread 'main' panicked at 'assertion failed: `(left == right)`
  left: `-1.0`,
 right: `-2.0`', src/main.rs:24:9

这似乎是一个编译器错误,请参阅此处此处.特别是,您正在调用诸如 _mm256_set1_ps_mm256_fmadd_ps,它们分别需要 CPU 特性 avxfma,但是你的代码和编译命令都没有向编译器表明应该使用这些特性.

This appears to be a compiler bug, see here and here. In particular, you are calling routines like _mm256_set1_ps and _mm256_fmadd_ps, which require the CPU features avx and fma respectively, but neither your code nor your compilation command indicate to the compiler that such features should be used.

解决此问题的一种方法是告诉编译器在启用 avxfma 功能的情况下编译整个程序,如下所示:

One way of fixing this is to tell the compiler to compile the entire program with both the avx and fma features enabled, like so:

$ RUSTFLAGS="-C target-feature=+avx,+fma" cargo run --release
   Compiling so53831502 v0.1.0 (/tmp/so53831502)
    Finished release [optimized] target(s) in 0.36s
     Running `target/release/so53831502`
ar: __m256(2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0)
br: __m256(-1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0)
abr: __m256(0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0)
abr: __m256(-2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0)
ab: [-2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]

实现相同结果的另一种方法是告诉编译器使用 CPU 上所有可用的 CPU 功能:

Another approach that achieves the same result is to tell the compiler to use all available CPU features on your CPU:

$ RUSTFLAGS="-C target-cpu=native" cargo run --release
   Compiling so53831502 v0.1.0 (/tmp/so53831502)
    Finished release [optimized] target(s) in 0.34s
     Running `target/release/so53831502`
ar: __m256(2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0)
br: __m256(-1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0)
abr: __m256(0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0)
abr: __m256(-2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0)
ab: [-2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]

然而,这两个编译命令生成的二进制文件只能在支持 avxfma 功能的 CPU 上运行.如果这对您来说不是问题,那么这是一个很好的解决方案.如果您想构建可移植的二进制文件,那么您可以在运行时执行 CPU 功能检测,并在启用特定 CPU 功能的情况下编译某些功能.然后您有责任保证仅在相应的 CPU 功能启用且可用时调用所述函数.这个过程被记录为 动态 CPU 的一部分std::arch 文档的特征检测部分.

However, both of these compilation commands produce binaries that can only run on CPUs that support the avx and fma features. If that's not a problem for you, then this is a fine solution. If you would instead like to build portable binaries, then you can perform CPU feature detection at runtime, and compile certain functions with specific CPU features enabled. It is then your responsibility to guarantee that said functions are only invoked when the corresponding CPU feature is enabled and available. This process is documented as part of the dynamic CPU feature detection section of the std::arch docs.

以下是使用运行时 CPU 功能检测的示例:

Here's an example that uses runtime CPU feature detection:

use std::arch::x86_64::*;
use std::process;

fn main() {
    if is_x86_feature_detected!("avx") && is_x86_feature_detected!("fma") {
        // SAFETY: This is safe because we're guaranteed to support the
        // necessary CPU features.
        unsafe { doit(); }
    } else {
        eprintln!("unsupported CPU");
        process::exit(1);
    }
}

#[target_feature(enable = "avx,fma")]
unsafe fn doit() {
    let a = vec![2.0f32, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0];
    let b = -1.0f32;

    let ar = _mm256_loadu_ps(a.as_ptr());
    println!("ar: {:?}", ar);

    let br = _mm256_set1_ps(b);
    println!("br: {:?}", br);

    let mut abr = _mm256_setzero_ps();
    println!("abr: {:?}", abr);

    abr = _mm256_fmadd_ps(ar, br, abr);
    println!("abr: {:?}", abr);

    let mut ab = [0.0; 8];
    _mm256_storeu_ps(ab.as_mut_ptr(), abr);
    println!("ab: {:?}", ab);

    assert_eq!(ab[0], -2.0f32);
}

要运行它,您不再需要设置任何编译标志:

To run it, you no longer need to set any compilation flags:

$ cargo run --release
   Compiling so53831502 v0.1.0 (/tmp/so53831502)
    Finished release [optimized] target(s) in 0.29s
     Running `target/release/so53831502`
ar: __m256(2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0)
br: __m256(-1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0)
abr: __m256(0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0)
abr: __m256(-2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0)
ab: [-2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]

如果您在不支持 avxfma 的 CPU 上运行生成的二进制文件,则程序应退出并显示错误消息:unsupportedCPU.

If you run the resulting binary on a CPU that doesn't support either avx or fma, then the program should exit with an error message: unsupported CPU.

总的来说,我认为 std::arch 的文档可以改进.特别是,您需要拆分代码的关键边界取决于您的向量类型是否出现在您的函数签名中.也就是说,doit 例程不需要任何超出标准 x86(或 x86_64)函数 ABI 的调用,因此可以安全地从不支持 avx 的函数调用code> 或 fma.但是,在内部,该函数已被告知使用基于给定 CPU 功能的附加指令集扩展来编译其代码.这是通过 target_feature 属性实现的.例如,如果您提供了不正确的目标特征:

In general, I think the docs for std::arch could be improved. In particular, the key boundary at which you need to split your code is dependent upon whether your vector types appear in your function signature. That is, the doit routine does not require anything beyond the standard x86 (or x86_64) function ABI to call, and is thus safe to call from functions that don't otherwise support avx or fma. However, internally, the function has been told to compile its code using additional instruction set extensions based on the given CPU features. This is achieved via the target_feature attribute. If you, for example, supplied an incorrect target feature:

#[target_feature(enable = "ssse3")]
unsafe fn doit() {
    // ...
}

然后该程序表现出与您的初始程序相同的行为.

then the program exhibits the same behavior as your initial program.

这篇关于SIMD 代码在 Debug 中有效,但在 Release 中无效的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆