我怎么知道我是否可以使用 FMA 指令集进行编译? [英] How do I know if I can compile with FMA instruction sets?

查看:26
本文介绍了我怎么知道我是否可以使用 FMA 指令集进行编译?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我看到了有关如何使用 FMA 指令集的问题,但在我开始使用它们之前,我首先想知道我是否可以(我的处理器是否支持它们).我发现一个帖子说我需要查看(在 Linux 上工作)的输出:

more/proc/cpuinfo

来了解一下.我明白了:

处理器:0vendor_id : 正版英特尔CPU系列:6型号 : 30型号名称 : Intel(R) Xeon(R) CPU X3470 @ 2.93GHz步数:5CPU 兆赫:2933.235大小:8192 KB物理 ID:0兄弟姐妹:4核心 ID:0CPU核心:4酸性:0初始酸性:0fpu:是的fpu_exception : 是cpuid 级别:11wp:是的标志: fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc psc_bstopsfmondtes64 监视器 ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 popcnt lahf_lm ida dts tpr_shadow vnmi flexpriority ept vpid博戈米普斯:5866.47clflush 大小:64缓存对齐:64地址大小:36 位物理,48 位虚拟

似乎最有趣的是标志部分,但我不确定如何从该列表中找出处理器是否支持这些指令.

有人知道怎么查吗?谢谢.

解决方案

我假设您想在编译时在 C/C++ 中检测它.

FP_FAST_FMA 宏不是一种检测 FMA 指令集的可靠方法.如果 std::fmax* 快,这个宏定义在 "math.h"/y+z,如果它是基于 FMA 指令集的内在函数,这是可能的.否则它将使用非常慢的非内在函数.现在在 2016 年 GCC 的默认 glibc/libstdc++ 定义了这个宏,但大多数其他标准库实现没有(包括 LLVM libc++、ICC 和 MSVC).这并不意味着如果可能的话,他们不会将 std::fma 实现为内在的,他们只是忘记定义这个宏.

可靠的 FMA 检测

要在编译时可靠地检测 FMA(或任何指令集),您需要使用指令集特定的宏.这些宏由编译器根据选定的目标架构和/或指令集定义.

有一个用于 FMA/FMA3 支持的 __FMA__ 宏,以及用于 AMD FMA4 支持的 __FMA4__ 宏.GCC、clang 和 ICC 确实定义了它们.

不幸的是,除了 __AVX____AVX2__ 之外,MSVC 没有定义任何指令集特定的宏.

交叉编译器 FMA 检测

对于英特尔处理器 FMA 是由 英特尔 Haswell 与 AVX2 一起引入的.>

对于 AMD 处理器来说,事情有点乱.FMA4 是由 AMD Bulldozer 随 AVX 和 XOP 引入的.FMA3(英特尔 FMA 等效版本)由 AMD Piledriver 引入.您可以在编译时通过 FMA(__FMA__ 宏)和 BMI(__BMI__ 宏)指令集将 Piledriver 与其前身 Bulldozer 区分开来.不幸的是,MSVC 两者都没有定义.

尽管如此,与 Intel 处理器一样,如果存在 AVX2,所有 AMD 处理器都支持 FMA/FMA3.

如果要交叉编译器检测目标架构是否支持 FMA/FMA3,则必须检测 __AVX2__ 宏,因为如果启用了 AVX2,所有主要编译器(包括 MSVC)都定义了它:

#if !defined(__FMA__) &&定义(__AVX2__)#define __FMA__ 1#万一

不幸的是,没有可靠的方法来仅使用 __AVX____AVX2__ 宏来检测 AMD FMA4.

注意事项

FMA 指令只有在编译器启用时才在您的程序中实际可用. 在 GCC 和 clang 中,您需要设置正确的目标架构(如 -march=haswell) 或使用 -mfma 标志手动启用 FMA 指令集.ICC 使用 -xavx2 标志自动启用 FMA.MSVC 使用 /arch:AVX2/fp:fast/O2 选项启用 FMA.

AMD 宣布将在未来停止对 FMA4 的支持.

I have seen questions about how to use FMA instructions set but before I get to start using them, I'd first like to know if I can (does my processor support them). I found a post saying that I needed to look at the output of (working on Linux):

more /proc/cpuinfo

to find out. I get this:

processor       : 0                                                  
vendor_id       : GenuineIntel                                       
cpu family      : 6                                                  
model           : 30                                                 
model name      : Intel(R) Xeon(R) CPU           X3470  @ 2.93GHz    
stepping        : 5                                                  
cpu MHz         : 2933.235                                           
size            : 8192 KB                                            
physical id     : 0                                                  
siblings        : 4                                                  
core id         : 0                                                  
cpu cores       : 4                                                  
apicid          : 0                                                  
initial apicid  : 0                                                  
fpu             : yes                                                
fpu_exception   : yes                                                
cpuid level     : 11                                                 
wp              : yes                                                
flags           : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni 
dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 popcnt lahf_lm ida dts tpr_shadow vnmi flexpriority ept vpid                                                                                                       
bogomips        : 5866.47                                                                                                                                                                                                                   
clflush size    : 64                                                                                                                                                                                                                        
cache_alignment : 64                                                                                                                                                                                                                        
address sizes   : 36 bits physical, 48 bits virtual     

What seems the most interesting is the flags part but I am not sure how to find out from that list if the processor supports these instructions.

Does anybody know how to find that out? Thank you.

解决方案

I assume you want to detect it in C/C++ at compile-time.

FP_FAST_FMA macro is not a reliable way to detect FMA instruction set. This macro is defined in "math.h"/<cmath> if std::fma is faster than x*y+z, which is possible if it's an intrinsic function based on an FMA instruction set. Otherwise it will use a non-intrinsic function which is very slow. Now in 2016 GCC's default glibc/libstdc++ defines this macro, but most other standard library implementations don't (including LLVM libc++, ICC's and MSVC's). It doesn't mean that they don't implement std::fma as an intrinsic if possible, they just forgot to define this macro.

Reliable FMA detection

To reliably detect FMA (or any instruction set) at compile time you need to use instruction set specific macros. These macros are defined by the compiler based on the selected target architecture and/or instruction sets.

There is an __FMA__ macro for FMA/FMA3 support, and __FMA4__ macro for AMD FMA4 support. GCC, clang and ICC do define them.

Unfortunately MSVC doesn't define any instruction set specific macros other than __AVX__ and __AVX2__.

Cross-compiler FMA detection

For Intel processors FMA were introduced with AVX2 by Intel Haswell.

For AMD processors, the thing is a little bit messy. FMA4 were introduced with AVX and XOP by AMD Bulldozer. FMA3 (Intel FMA equivalent) were introduced by AMD Piledriver. You can distinguish Piledriver from its predecessor Bulldozer at compile time by the presence of FMA (__FMA__ macro) and BMI (__BMI__ macro) instruction sets. Unfortunately MSVC doesn't define neither.

Nevertheless, like Intel processors, all AMD processors support FMA/FMA3 if AVX2 is present.

If you want cross-compiler detection whether the target architecture supports FMA/FMA3, you must detect the __AVX2__ macro, since it is defined by all major compilers (including MSVC) if AVX2 is enabled:

#if !defined(__FMA__) && defined(__AVX2__)
    #define __FMA__ 1
#endif

Unfortunately there is no reliable way to detect AMD FMA4 using only __AVX__ and __AVX2__ macros.

Notes

FMA instructions are actually available in your program only if it's enabled by the compiler. In GCC and clang you need to set the proper target architecture (like -march=haswell) or manually enable the FMA instruction set with -mfma flag. ICC enables FMA automatically with the -xavx2 flag. MSVC enables FMA with the /arch:AVX2 /fp:fast /O2 options.

AMD announced that it will drop support of FMA4 in the future.

这篇关于我怎么知道我是否可以使用 FMA 指令集进行编译?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆