_mm256_load_ps在调试模式下导致Google/基准细分错误 [英] _mm256_load_ps cause segmentation fault with google/benchmark in debug mode

查看：88 发布时间：2021/4/12 20:53:47 c++ segmentation-fault simd avx google-benchmark

本文介绍了_mm256_load_ps在调试模式下导致Google/基准细分错误的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

以下代码可以在发布和调试模式下运行.

#include <immintrin.h>

constexpr int n_batch = 10240;
constexpr int n = n_batch * 8;
#pragma pack(32)
float a[n];
float b[n];
float c[n];
#pragma pack()

int main() {
    for(int i = 0; i < n; ++i)
        c[i] = a[i] * b[i];

    for(int i = 0; i < n; i += 4) {
        __m128 av = _mm_load_ps(a + i);
        __m128 bv = _mm_load_ps(b + i);
        __m128 cv = _mm_mul_ps(av, bv);
        _mm_store_ps(c + i, cv);
    }

    for(int i = 0; i < n; i += 8) {
        __m256 av = _mm256_load_ps(a + i);
        __m256 bv = _mm256_load_ps(b + i);
        __m256 cv = _mm256_mul_ps(av, bv);
        _mm256_store_ps(c + i, cv);
    }
}

以下代码只能在发布模式下运行，而在调试模式下会出现分段错误.

#include <immintrin.h>

#include "benchmark/benchmark.h"

constexpr int n_batch = 10240;
constexpr int n = n_batch * 8;
#pragma pack(32)
float a[n];
float b[n];
float c[n];
#pragma pack()

static void BM_Scalar(benchmark::State &state) {
    for(auto _: state)
        for(int i = 0; i < n; ++i)
            c[i] = a[i] * b[i];
}
BENCHMARK(BM_Scalar);

static void BM_Packet_4(benchmark::State &state) {
    for(auto _: state) {
        for(int i = 0; i < n; i += 4) {
            __m128 av = _mm_load_ps(a + i);
            __m128 bv = _mm_load_ps(b + i);
            __m128 cv = _mm_mul_ps(av, bv);
            _mm_store_ps(c + i, cv);
        }
    }
}
BENCHMARK(BM_Packet_4);

static void BM_Packet_8(benchmark::State &state) {
    for(auto _: state) {
        for(int i = 0; i < n; i += 8) {
            __m256 av = _mm256_load_ps(a + i); // Signal: SIGSEGV (signal SIGSEGV: invalid address (fault address: 0x0))
            __m256 bv = _mm256_load_ps(b + i);
            __m256 cv = _mm256_mul_ps(av, bv);
            _mm256_store_ps(c + i, cv);
        }
    }
}
BENCHMARK(BM_Packet_8);

BENCHMARK_MAIN();

推荐答案

您的数组未与32对齐.您可以使用调试器进行检查.

Your arrays aren't aligned by 32. You could check this with a debugger.

#pragma pack(32)仅对齐struct/union/class成员，

#pragma pack(32) only aligns struct/union/class members, as documented by MS. C++ arrays are a different kind of object and aren't affected at all by that MSVC pragma. (I think you're actually using GCC's or clang's version of it, though, because MSVC generally uses vmovups not vmovaps)

对于静态或自动存储中的数组(未动态分配)，在C ++ 11和更高版本中对齐数组的最简单方法是 alignas(32).这是完全可移植的，与GNU C __ attribute __((aligned(32)))或MSVC的等效版本不同.

For arrays in static or automatic storage (not dynamically allocated), the easiest way to align arrays in C++11 and later is alignas(32). That's fully portable, unlike GNU C __attribute__((aligned(32))) or whatever MSVC's equivalent is.

alignas(32) float a[n];
alignas(32) float b[n];
alignas(32) float c[n];

AVX:数据对齐:存储崩溃，存储，加载，loadu不会解释为什么在优化级别上会有所不同:优化的代码会将一个负载折叠到 vmulps 的内存源操作数中(与SSE不同)，该操作不需要对齐.(大概第一个数组恰好对齐了.)

AVX: data alignment: store crash, storeu, load, loadu doesn't explains why there's a difference depending on optimization level: optimized code will fold one load into a memory source operand for vmulps which (unlike SSE) doesn't require alignment. (Presumably the first array happens to be sufficiently aligned.)

未经优化的代码将单独执行 _mm256_load_ps 和 vmovaps 需要对齐的负载.

Un-optimized code will do the _mm256_load_ps separately with a vmovaps alignment-required load.

( _mm256_loadu_ps 将始终避免使用需要对齐的负载，因此，如果不能保证数据对齐，请使用该负载.)

(_mm256_loadu_ps will always avoid using alignment-required loads, so use that if you can't guarantee your data is aligned.)

这篇关于_mm256_load_ps在调试模式下导致Google/基准细分错误的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

_mm256_load_ps在调试模式下导致Google/基准细分错误 [英] _mm256_load_ps cause segmentation fault with google/benchmark in debug mode

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录关闭

_mm256_load_ps在调试模式下导致Google/基准细分错误 [英] _mm256_load_ps cause segmentation fault with google/benchmark in debug mode

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录 关闭

登录关闭