错误:未在此范围内声明"_mm512_loadu_epi64" [英] error: '_mm512_loadu_epi64' was not declared in this scope

查看：197 发布时间：2020/11/13 0:11:42 c++ gcc x86 intrinsics avx512

本文介绍了错误:未在此范围内声明"_mm512_loadu_epi64"的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试为此问题报告创建一个最小的复制器. AVX-512似乎存在一些问题，该产品已在装有Skylake处理器的最新Apple机器上发货.

I'm trying to create a minimal reproducer for this issue report. There seems to be some problems with AVX-512, which is shipping on the latest Apple machines with Skylake processors.

根据 GCC6发行说明，AVX-512齿轮应该可用.根据英特尔内在指南 vmovdqu64可与AVX-512VL和AVX-512F:

According to GCC6 release notes the AVX-512 gear should be available. According to the Intel Intrinsics Guide vmovdqu64 is available with AVX-512VL and AVX-512F:

$ cat test.cxx
#include <cstdint>
#include <immintrin.h>
int main(int argc, char* argv[])
{
    uint64_t x[8];
    __m512i y = _mm512_loadu_epi64(x);
    return 0;
}

然后:

$ /opt/local/bin/g++-mp-6 -mavx512f -Wa,-q test.cxx -o test.exe
test.cxx: In function 'int main(int, char**)':
test.cxx:6:37: error: '_mm512_loadu_epi64' was not declared in this scope
     __m512i y = _mm512_loadu_epi64(x);
                                     ^
$ /opt/local/bin/g++-mp-6 -mavx -mavx2 -mavx512f -Wa,-q test.cxx -o test.exe
test.cxx: In function 'int main(int, char**)':
test.cxx:6:37: error: '_mm512_loadu_epi64' was not declared in this scope
     __m512i y = _mm512_loadu_epi64(x);
                                     ^
$ /opt/local/bin/g++-mp-6 -msse4.1 -msse4.2 -mavx -mavx2 -mavx512f -Wa,-q test.cxx -o test.exe
test.cxx: In function 'int main(int, char**)':
test.cxx:6:37: error: '_mm512_loadu_epi64' was not declared in this scope
     __m512i y = _mm512_loadu_epi64(x);
                                     ^

我没有成功将选项移回-msse2.我似乎丢失了一些东西.

I walked the options back to -msse2 without success. I seem to be missing something.

使用AVX-512进行现代GCC需要什么?

What is required to engage AVX-512 for modern GCC?

根据/opt/local/bin/g++-mp-6 -v，以下是标题搜索路径:

According to a /opt/local/bin/g++-mp-6 -v, these are the header search paths:

#include "..." search starts here:
#include <...> search starts here:
 /opt/local/include/gcc6/c++/
 /opt/local/include/gcc6/c++//x86_64-apple-darwin13
 /opt/local/include/gcc6/c++//backward
 /opt/local/lib/gcc6/gcc/x86_64-apple-darwin13/6.5.0/include
 /opt/local/include
 /opt/local/lib/gcc6/gcc/x86_64-apple-darwin13/6.5.0/include-fixed
 /usr/include
 /System/Library/Frameworks
 /Library/Frameworks

然后:

$ grep -R '_mm512_' /opt/local/lib/gcc6/ | grep avx512f | head -n 8
/opt/local/lib/gcc6//gcc/x86_64-apple-darwin13/6.5.0/include/avx512fintrin.h:_mm512_set_epi64 (long long __A, long long __B, long long __C,
/opt/local/lib/gcc6//gcc/x86_64-apple-darwin13/6.5.0/include/avx512fintrin.h:_mm512_set_epi32 (int __A, int __B, int __C, int __D,
/opt/local/lib/gcc6//gcc/x86_64-apple-darwin13/6.5.0/include/avx512fintrin.h:_mm512_set_pd (double __A, double __B, double __C, double __D,
/opt/local/lib/gcc6//gcc/x86_64-apple-darwin13/6.5.0/include/avx512fintrin.h:_mm512_set_ps (float __A, float __B, float __C, float __D,
/opt/local/lib/gcc6//gcc/x86_64-apple-darwin13/6.5.0/include/avx512fintrin.h:#define _mm512_setr_epi64(e0,e1,e2,e3,e4,e5,e6,e7)                       \
/opt/local/lib/gcc6//gcc/x86_64-apple-darwin13/6.5.0/include/avx512fintrin.h:  _mm512_set_epi64(e7,e6,e5,e4,e3,e2,e1,e0)
/opt/local/lib/gcc6//gcc/x86_64-apple-darwin13/6.5.0/include/avx512fintrin.h:#define _mm512_setr_epi32(e0,e1,e2,e3,e4,e5,e6,e7,                       \
/opt/local/lib/gcc6//gcc/x86_64-apple-darwin13/6.5.0/include/avx512fintrin.h:  _mm512_set_epi32(e15,e14,e13,e12,e11,e10,e9,e8,e7,e6,e5,e4,e3,e2,e1,e0)
...

推荐答案

在没有任何屏蔽的情况下，没有理由使用此内在函数或使用它代替等效的_mm512_loadu_si512 .这只是令人困惑，并且可能使人类读者误以为这是单个epi64的vmovq零扩展负载.

Intel's intrinsics finder does specify that it exists, but even current trunk gcc (on Godbolt) doesn't define it.

几乎所有AVX512指令都支持合并屏蔽和零屏蔽.以前纯粹是按位/完整寄存器的指令，现在没有有意义的元素边界，现在有32位和64位元素风格，如vpxord和vpxorq.或 vmovdqa32和vmovdqa64 .但是使用没有屏蔽的任何版本仍然只是正常的向量加载/存储/寄存器复制，并且在带有内在函数的C ++源代码中为它们指定元素大小的任何内容(仅包含向量的总宽度)是没有意义的.

Almost all AVX512 instructions support merge-masking and zero-masking. Instructions that used to be purely bitwise / whole-register with no meaningful element boundaries now come in 32 and 64-bit element flavours, like vpxord and vpxorq. Or vmovdqa32 and vmovdqa64. But using either version with no masking is still just a normal vector load / store / register-copy, and it's not meaningful to specify anything about element-size for them in the C++ source with intrinsics, only the total vector width.

另请参见 _mm512_load_epi32和_mm512_load_si512之间有什么区别?

SSE *和AVX1/2选项与GCC标头是否根据gcc内置定义此内在函数无关. -mavx512f已经暗示了AVX512之前的所有Intel SSE/AVX扩展.

SSE* and AVX1/2 options are irrelevent to whether or not GCC headers define this intrinsic in terms of gcc built-ins or not; -mavx512f already implies all of the Intel SSE/AVX extensions before AVX512.

它存在于clang干线中(但不存在7.0，所以它是最近才添加的.)

It is present in clang trunk (but not 7.0 so it was only very recently added).

未对齐的_mm512_loadu_si512-各处均受支持，请使用此
未对齐的_mm512_loadu_epi64-lang声，而不是gcc.
对齐的_mm512_load_si512-各地都受支持，请使用此
对齐的_mm512_load_epi64-到处都得到支持，令人惊讶.
未对齐的_mm512_maskz_loadu_epi64-各处均受支持，可将其用于零掩盖载荷
未对齐的_mm512_mask_loadu_epi64-各处均受支持，可将其用于合并掩码加载.

unaligned _mm512_loadu_si512 - supported everywhere, use this
unaligned _mm512_loadu_epi64 - clang trunk, not gcc.
aligned _mm512_load_si512 - supported everywhere, use this
aligned _mm512_load_epi64 - also supported everywhere, surprisingly.
unaligned _mm512_maskz_loadu_epi64 - supported everywhere, use this for zero-masked loads
unaligned _mm512_mask_loadu_epi64 - supported everywhere, use this for merge-mask loads.

此代码最早可在-march=avx512f上在gcc上于4.9.0上编译，而主线(Linux)则可在3.9上上进行编译.或者，如果他们支持，则为-march=skylake-avx512或-march=knl.我尚未使用Apple Clang进行测试.

This code compiles on gcc as early as 4.9.0, and mainline (Linux) clang as early as 3.9, both with -march=avx512f. Or if they support it, -march=skylake-avx512 or -march=knl. I haven't tested with Apple Clang.

#include <immintrin.h>

__m512i loadu_si512(void *x) { return _mm512_loadu_si512(x); }
__m512i load_epi64(void *x)  {  return _mm512_load_epi64(x); }
//__m512i loadu_epi64(void *x) {  return _mm512_loadu_epi64(x); }

__m512i loadu_maskz(void *x) { return _mm512_maskz_loadu_epi64(0xf0, x); }
__m512i loadu_mask(void *x)  { return _mm512_mask_loadu_epi64(_mm512_setzero_si512(), 0xf0, x); }

Godbolt link; you can uncomment the _mm512_loadu_epi64 and flip the compiler to clang trunk to see it work there.

这篇关于错误:未在此范围内声明"_mm512_loadu_epi64"的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

错误:未在此范围内声明"_mm512_loadu_epi64" [英] error: '_mm512_loadu_epi64' was not declared in this scope

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录关闭

错误:未在此范围内声明"_mm512_loadu_epi64" [英] error: &#39;_mm512_loadu_epi64&#39; was not declared in this scope

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录 关闭

错误:未在此范围内声明"_mm512_loadu_epi64" [英] error: '_mm512_loadu_epi64' was not declared in this scope

登录关闭