哪些版本的 Windows 支持/需要哪些 CPU 多媒体扩展?(如何检查 SSE 或 AVX 是否完全可用?) [英] Which versions of Windows support/require which CPU multimedia extensions? (How to check if SSE or AVX are fully usable?)

查看:52
本文介绍了哪些版本的 Windows 支持/需要哪些 CPU 多媒体扩展?(如何检查 SSE 或 AVX 是否完全可用?)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

到目前为止,我已经设法发现:

So far I have managed to find out that:

  • SSE 和 SSE2 对于 Windows 8 及更高版本(当然也适用于任何 64 位操作系统)是必需的
  • AVX 仅受 Windows 7 SP1 或更高版本支持

在 Windows 上使用 SSE3、SSSE3、SSE4.1、SSE 4.2、AVX2 和 AVX-512 有什么注意事项吗?

Are there any caveats regarding using SSE3, SSSE3, SSE4.1, SSE 4.2, AVX2 and AVX-512 on Windows?

一些说明:如果我使用 SSE/AVX 集之一的指令,我需要它来确定我的程序将在哪些操作系统上运行.

Some clarification: I need this to determine what OSs will my program run on if I use instructions from one of the SSE/AVX sets.

推荐答案

引入新架构状态的扩展需要特殊的操作系统支持,因为操作系统必须在上下文切换时保存/恢复恢复更多数据.因此,从操作系统的角度来看,如果操作系统支持 SSE,则无需执行任何额外操作即可让用户空间代码运行 SSSE3 指令.

Extensions that introduce new architectural state require special OS support, because the OS has to save/restore restore more data on context switches. So from the OSes perspective, there's nothing extra it needs to do to let user-space code run SSSE3 instructions, if the OS supports SSE.

SSE、AVX 和 AVX512 是引入新架构状态的扩展.

SSE, AVX, and AVX512 are the extensions that introduced new architectural state.

  • SSE 引入了 xmm regs(以及用于舍入模式和 FP 异常状态的 MXCSR)
  • AVX 引入了 ymm(下半部分是旧的 xmm regs).
  • AVX512 引入了 zmm(扩展了 x/ymm regs),并且还在 64 位模式下将向量 regs 的数量增加了一倍:zmm0-zmm31.x/y/zmm16..31 只能通过向量指令的 AVX-512 编码(EVEX 前缀)访问,因此有趣的是可以在没有 需要 vzeroupper不受它影响.
    k0..k7 64 位掩码寄存器(或 16 位,Xeon Phi 中没有 AVX-512BW)也是 AVX-512 中的新功能.
  • SSE introduced the xmm regs (and MXCSR for rounding modes and FP exception state)
  • AVX introduced ymm (the lower half of which are the old xmm regs).
  • AVX512 introduced zmm (extending the x/ymm regs), and also doubled the number of vector regs in 64bit mode: zmm0-zmm31. x/y/zmm16..31 are only accessible with AVX-512 encodings of vector instructions (EVEX prefix), and thus interestingly can be used without requiring vzeroupper, and aren't affected by it.
    k0..k7 64-bit mask registers (or 16-bit without AVX-512BW in Xeon Phi) are also new in AVX-512.

您可以使用 CPUID 指令以通常的方式检查 CPU 对 SSE 或 AVX 的支持.

You check for CPU support for SSE or AVX the usual way, with the CPUID instruction.

为了防止在多任务操作系统上使用新扩展时发生静默数据损坏,在上下文切换时不会保存/恢复新的架构状态,如果操作系统没有 在控制寄存器中设置操作系统支持位.所以矢量扩展不起作用"在不知道为该扩展程序保存/恢复必要状态的操作系统上.

To prevent silent data corruption when using a new extension on a multi-tasking OS that doesn't save/restore the new architectural state on context switches, SSE instructions fault as illegal instructions if the OS hasn't set an OS-support bit in a control register. So vector extensions "don't work" on OSes that don't know about saving/restoring the necessary state for that extension.

对于 SSE,可能没有任何干净的独立于操作系统的方法来检测操作系统是否承诺通过设置 CR4.OSFXSRCR4.OSFXSR 在上下文切换时保存/恢复 SSE 状态code>, CR4.OSXMMEXCPT 等位,因为甚至阅读控制寄存器是有特权的,并且没有反映设置的 CPUID 位.SSE 支持如此广泛,以至于您必须使用非常古老的版本(或自制软件)操作系统才会出现此问题.

For SSE, there may not be any clean OS-independent way to detect that the OS has promised to save/restore SSE state on context switches by setting the CR4.OSFXSR, CR4.OSXMMEXCPT etc. bits, because even reading a control register is privileged, and there's no CPUID bit that reflects the setting. SSE support is so widespread that you'd have to be using a really ancient version (or homebrew) OS for this to be a problem.

对于 AVX,我们不需要操作系统支持来检测 AVX 是否可用(由硬件支持并由操作系统启用):用户空间可以运行 xgetbv并检查启用功能标志以查看操作系统是否已启用 AVX 指令以无故障运行.

For AVX, we don't need OS support to detect that AVX is usable (supported by hardware and enabled by the OS): User-space can run xgetbv and check the enabled-feature flags to see if the OS has enabled AVX instructions to run without faulting.

来自英特尔对 AVX 的介绍:

  • 使用以下方法验证操作系统是否支持 XGETBVCPUID.1:ECX.OSXSAVE 位 27 = 1.
  • 同时,验证CPUID.1:ECX bit 28=1(支持 Intel AVX)和/或 bit 25=1(AES支持) ... (以及 FMA、AES 和 PCLMULQDQ 的其他位)
  • 发出 XGETBV,并验证启用功能的掩码第 1 位和第 2 位是 11b(XMM 状态和 YMM 状态由操作系统).
  • Verify that the operating system supports XGETBV using CPUID.1:ECX.OSXSAVE bit 27 = 1.
  • At the same time, verify that CPUID.1:ECX bit 28=1 (Intel AVX supported) and/or bit 25=1 (AES supported) ... (and other bits for FMA, AES, and PCLMULQDQ)
  • Issue XGETBV, and verify that the feature-enabled mask at bits 1 and 2 are 11b (XMM state and YMM state enabled by the operating system).

调用 OS 提供的函数来检测 OS 支持可能更容易,而不是使用内联 asm 或功能检测库来完成所有这些.例如,Win7SP1 引入了 GetEnabledXStateFeatures 以及对 AVX CPU 的支持.(在没有 SSE 的 CPU 上运行 Win7SP1 不太可能或不可能,因此对于 SSE,您只需检查 CPUID 和操作系统版本即可.)

It may be easier to call an OS-provided function to detect OS support, instead of using inline asm or a feature-detect library to do all this. For example, Win7SP1 introduced GetEnabledXStateFeatures along with support for AVX CPUs. (It's unlikely or maybe impossible to find Win7SP1 running on a CPU without SSE, so for SSE you can just check CPUID and OS version.)

这也被理解为对操作系统的上下文切换将正确保存/恢复完整状态的承诺,当然,有缺陷的、恶意的或深奥的操作系统(也许是协作式多任务处理?)可能会有所不同.对于包括 Windows 在内的主流操作系统,这确实意味着 YMM 寄存器将按照您的预期保持其值.

This is also understood to be a promise that the OS's context switches will correctly save/restore the full state, although of course a buggy, malicious, or esoteric OS (perhaps cooperative multi-tasking?) could be different. For mainstream OSes including Windows, it does mean YMM registers will keep their values just like you'd expect.

AVX512 也是如此:您可以检查指令集的 CPUID 功能位,检查操作系统是否承诺在通过在 XSETBV 中启用正确的位来切换上下文.(所以你应该检查 XGETBV).检查 XGETBV 结果 AND 0xE6 等于 0xE6.

The same is true for AVX512: you can check the CPUID feature bit for the instruction set, and check that the OS has promised to manage the new architectural state on context switches by enabling the right bits in with XSETBV. (So you should check with XGETBV). Check for XGETBV result AND 0xE6 equals to 0xE6.

这篇关于哪些版本的 Windows 支持/需要哪些 CPU 多媒体扩展?(如何检查 SSE 或 AVX 是否完全可用?)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆