禁用所有用于g ++构建的AVX-512指令 [英] disable all AVX-512 instructions for g++ build

查看:207
本文介绍了禁用所有用于g ++构建的AVX-512指令的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用这些标志在没有任何avx512指令的情况下进行构建: -march = native -mno-avx512f .但是我仍然得到一个二进制文件生成了AVX512( vmovss )指令(我正在使用elfx86exts进行检查).知道如何禁用这些功能吗?

Hi I'm trying to build without any avx512 instructions by using those flags: -march=native -mno-avx512f. However i still get a binary which has AVX512 (vmovss) instruction generated (i'm using elfx86exts to check). Any idea how to disable those ?

推荐答案

-march = native -mno-avx512f 是正确的选项, vmovss 的AVX512F EVEX编码,但是除非涉及的寄存器是 xmm16..31 ,否则GAS不会使用它.当您使用 -mno-avx512f 禁用AVX512F或不使用 -march = skylake 之类的功能来启用它时,GCC不会使用这些寄存器发出asm.或 -march = znver2 .

There is an AVX512F EVEX encoding of vmovss, but GAS won't use it unless the register involved is xmm16..31. GCC won't emit asm using those registers when you disable AVX512F with -mno-avx512f, or don't enable it in the first place with something like -march=skylake or -march=znver2.

如果仍然不确定,请检查实际的反汇编+机器代码,以查看该指令以什么前缀开头:

If you're still not sure, check the actual disassembly + machine code to see what prefix the instruction starts with:

  • C5 C4 字节:2或3字节VEX前缀(AVX1编码)的开头.
  • 62 字节:EVEX前缀的开头,AVX512F编码
  • a C5 or C4 byte: start of a 2 or 3 byte VEX prefix, AVX1 encoding.
  • a 62 byte: start of an EVEX prefix, AVX512F encoding
.intel_syntax noprefix
vmovss xmm15, [rdi]
vmovss xmm15, [r11]
vmovss xmm16, [rdi]

使用 gcc -c avx.s 进行汇编,并使用 objdump -drwC -Mintel avx.o 进行反汇编:

assembled with gcc -c avx.s and disassemble with objdump -drwC -Mintel avx.o:

0000000000000000 <.text>:
   0:   c5 7a 10 3f             vmovss xmm15,DWORD PTR [rdi]   # AVX1
   4:   c4 41 7a 10 3b          vmovss xmm15,DWORD PTR [r11]   # AVX1
   9:   62 e1 7e 08 10 07       vmovss xmm16,DWORD PTR [rdi]   # AVX512F

10 操作码前的

2和3字节VEX,以及4字节EVEX前缀.(ModRM字节也不同; xmm0和xmm16的区别仅在于前缀的额外寄存器位,而不是modrm).

2 and 3 byte VEX, and 4 byte EVEX prefixes before the 10 opcode. (The ModRM bytes are different too; xmm0 and xmm16 would differ only in the extra register bit from the prefix, not the modrm).

GAS尽可能使用 vmovss 的AVX1 VEX编码和其他指令.因此,您可以指望使用非AVX512F格式的指令来使用非尽可能使用AVX512F表格.这就是GNU工具链(由GCC使用)使 -mno-avx512f 工作的方式.

GAS uses the AVX1 VEX encoding of vmovss and other instructions when possible. So you can count on instructions that have a non-AVX512F form to be using the non-AVX512F form whenever possible. This is how the GNU toolchain (used by GCC) makes -mno-avx512f work.

即使EVEX编码较短,这也适用 .例如当 [reg + constant] 可以使用AVX512缩放的disp8(按元素宽度缩放),但AVX1编码需要32位位移(以字节为单位)时.

This applies even when the EVEX encoding is shorter. e.g. when a [reg + constant] could use an AVX512 scaled disp8 (scaled by the element width) but the AVX1 encoding would need a 32-bit displacement that counts in bytes.

   f:   c5 7a 10 bf 00 01 00 00         vmovss xmm15,DWORD PTR [rdi+0x100]   # AVX1 [reg+disp32]
  17:   62 e1 7e 08 10 47 40    vmovss xmm16,DWORD PTR [rdi+0x100]           # AVX512 [reg + disp8*4]
  1e:   c5 78 28 bf 00 01 00 00         vmovaps xmm15,XMMWORD PTR [rdi+0x100]  # AVX1 [reg+disp32]
  26:   62 e1 7c 08 28 47 10    vmovaps xmm16,XMMWORD PTR [rdi+0x100]        # AVX512 [reg + disp8*16]

请注意机器代码编码的最后一个字节或最后4个字节:对于AVX1编码,它是32位的小尾数0x100字节位移,但是对于AVX512,它是8x的0x40 dword或0x10 dqwords位移.编码.

Note the last byte, or last 4 bytes, of the machine code encodings: it's a 32-bit little-endian 0x100 byte displacement for the AVX1 encodings, but an 8-bit displacement of 0x40 dwords or 0x10 dqwords for the AVX512 encodings.

但是使用 {evex} vmovaps xmm0 [rdi + 256] 的asm源覆盖,即使对于"low",我们也可以获得紧凑的编码.寄存器:

But using an asm-source override of {evex} vmovaps xmm0, [rdi+256] we can get the compact encoding even for "low" registers:

62 f1 7c 08 28 47 10    vmovaps xmm0,XMMWORD PTR [rdi+0x100]

GCC当然不会使用 -mno-avx512f 来做到这一点.

GCC will of course not do that with -mno-avx512f.

不幸的是,当您执行启用AVX512F(例如,编译 __ m128 load(__ m128 * p){时返回p [16];} -O3 -march = skylake-avx512 ( Godbolt ).使用二进制模式,或者只是注意在编译器输出的asm源代码行上缺少 {evex} 标记.

Unfortunately GCC and clang also miss that optimization when you do enable AVX512F, e.g. when compiling __m128 load(__m128 *p){ return p[16]; } with -O3 -march=skylake-avx512 (Godbolt). Use binary mode, or simply note the lack of an {evex} tag on that asm source line of compiler output.

这篇关于禁用所有用于g ++构建的AVX-512指令的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆