出于测试目的禁用CPU中的AVX2 [英] Disabling AVX2 in CPU for testing purposes

查看:101
本文介绍了出于测试目的禁用CPU中的AVX2的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个需要AVX2正常运行的应用程序.已实施检查以在应用程序启动期间检查CPU是否具有AVX2指令.我想检查它是否正常工作,但是我只有具有AVX2的CPU.有没有一种方法可以暂时将其关闭以进行测试?还是以某种方式模仿其他CPU?

I've got an application that requires AVX2 to work correctly. A check was implemented to check during application start if CPU has AVX2 instruction. I would like to check if it works correctly, but i only have CPU that has AVX2. Is there a way to temporarly turn it off for testing purposes? Or to somehow emulate other CPU?

推荐答案

是的,请使用

Yes, use an "emulation" (or dynamic recompilation) layer like Intel's Software Development Emulator (SDE), or maybe QEMU.

SDE是开源免费软件,非常方便用于在旧CPU上测试AVX512代码,或模拟旧CPU以检查您是否不会意外执行太新的指令.

SDE is closed-source freeware, and very handy for both testing AVX512 code on old CPUs, or for simulating old CPUs to check that you don't accidentally execute instructions that are too new.

示例:我碰巧有一个二进制文件,该二进制文件无条件地使用了AVX2 vpmovzxwq 加载指令(对于我正在测试的功能).它可以在我的Skylake CPU上正常运行,但是SDE具有 -snb 选项,可以在两个CPUID中模拟Sandybridge并实际检查每条指令.

Example: I happened to have a binary that unconditionally uses an AVX2 vpmovzxwq load instruction (for a function I was testing). It runs fine on my Skylake CPU natively, but SDE has a -snb option to emulate a Sandybridge in both CPUID and actually checking every instruction.

 $ sde64 -snb -- ./mask
TID 0 SDE-ERROR: Executed instruction not valid for specified chip (SANDYBRIDGE): 0x401005: vpmovzxwq ymm2, qword ptr [rip+0xff2]
Image: /tmp/mask+0x5 (in multi-region image, region# 1)
Instruction bytes are: c4 e2 7d 34 15 f2 0f 00 00 

有一些选项可以模拟最早使用 -quark -p4 (SSE2)或Core 2 Merom( -mrm )的CPU,直到最新的IceLake-Server( -icx )或Tremont( -tnt ).(还有KNL和KNM等Xeon Phi CPU.)

There are options to emulate CPUs as old as -quark, -p4 (SSE2), or Core 2 Merom (-mrm), to as new as IceLake-Server (-icx) or Tremont (-tnt). (And Xeon Phi CPUs like KNL and KNM.)

使用动态重新编译(JIT),它运行非常快,因此,我认为仅使用受本机支持的指令的代码就可以以本机速度运行.

It runs pretty quickly, using dynamic recompilation (JIT) so code using only instructions that are supported natively can run at basically native speed, I think.

它还具有检测选项(例如 -mix 来转储指令混合),以及用于更紧密地控制JIT的选项.我认为您可能会得到它不报告CPUID中的AVX2,但仍然可以让AVX2指令运行而不会出错.

It also has instrumentation options (like -mix to dump the instruction mix), and options to control the JIT more closely. I think you could maybe get it to not report AVX2 in CPUID, but still let AVX2 instructions run without faulting.

或者可能模拟支持AVX2但不支持FMA的CPU(不幸的是,有Via提供的真正的CPU这样的).或没有真正CPU的组合,例如AVX2,但没有 popcnt ,或BMI1/BMI2,但没有AVX.但是我还没有研究如何做到这一点.

Or probably emulate a CPU that supports AVX2 but not FMA (there is a real CPU like this from Via, unfortunately). Or combinations that no real CPU has, like AVX2 but not popcnt, or BMI1/BMI2 but not AVX. But I haven't looked into how to do that.

基本的 sde -help 选项仅允许您将其设置为特定的 Intel CPU,并用于检查可能缓慢的SSE/AVX转换(不正确使用vzeroupper).还有其他一些东西.

The basic sde -help options only let you set it to specific Intel CPUs, and for checking for potentially-slow SSE/AVX transitions (without correct vzeroupper usage). And a few other things.

缺少SDE的一个重要测试用例是AVX + FMA 没有 AVX2(AMD Piledriver/Steamroller,即大多数AMD FX系列CPU).忘记并在应该为AVX1 + FMA3的代码中使用AVX2随机播放,并且某些编译器(如MSVC)在编译时不会像 gcc -march = bdver2 那样捕获此错误.(推土机只有AVX + FMA4,没有FMA3,因为在AMD重新设计为时已晚之后,英特尔改变了他们的计划.)

One important test-case that SDE is missing is AVX+FMA without AVX2 (AMD Piledriver / Steamroller, i.e. most AMD FX-series CPUs). It's easy to forget and use an AVX2 shuffle in code that's supposed to be AVX1+FMA3, and some compilers (like MSVC) won't catch this at compile time the way gcc -march=bdver2 would. (Bulldozer only has AVX + FMA4, not FMA3, because Intel changed their plans after it was too late for AMD to redesign.)

如果仅希望CPUID不报告AVX2(和FMA?)的存在,以便您的代码使用其AVX1或非AVX版本的功能,则可以对大多数VM执行此操作.

If you just want CPUID not report the presence of AVX2 (and FMA?) so your code uses its AVX1 or non-AVX versions of functions, you can do that with most VMs.

要使AVX指令正常运行,必须将控制寄存器中的位置1.(因此,这就像操作系统承诺的那样,它将正确保存/恢复YMM上半部分的新体系结构状态).因此,禁用CPUID中的AVX将为您提供一个VMX实例出现故障的VM实例.(至少有256位指令?我没有尝试过看看是否可以在支持AVX的硬件上以这种状态执行128位AVX指令.)

For AVX instructions to run without faulting, a bit in a control register has to be set. (So this works like a promise by the OS that it will correctly save/restore the new architectural state of YMM upper halves). So disabling AVX in CPUID will give you a VM instance where AVX instructions fault. (At least 256-bit instructions? I haven't tried this to see if 128-bit AVX instructions can still execute in this state on HW that supports AVX.)

这篇关于出于测试目的禁用CPU中的AVX2的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆